Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

14.0K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
14.0K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.5K
3.5K
Protein Folding Quality Check in the RER01:29

Protein Folding Quality Check in the RER

4.9K
ER is the primary site for the maturation and folding of soluble and transmembrane secretory proteins. The calnexin cycle is a specific chaperone system that folds and assesses the confirmation of N-glycosylated proteins before they can exit the ER lumen. The primary players of this quality check pipeline are the lectins, ER-resident chaperones, and a glucosyl transferase enzyme. In case the calnexin system in the lumen fails to salvage a misfolded protein, it is transported to the cytoplasm...
4.9K
Quality Assurance01:19

Quality Assurance

913
Quality assurance is the overarching term used to describe the activities employed to ensure the proper performance of a system. These activities can be classified into three categories: quality control, quality assessment, and internal corrective measures. Typically, these activities work cyclically: quality control is performed before and during the analysis, while quality assessment occurs during and after the investigation. Internal corrective measures are implemented based on the findings...
913
Downsampling01:20

Downsampling

556
When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...
556
Quality Control01:05

Quality Control

1.2K
Quality control is one of the three cyclical quality assurance activities that help keep a system under statistical control. Typical quality control activities include creating quality control charts, conducting proficiency testing, and documenting and archiving results.
Quality control helps track data, visualize trends, and identify variations, making it easier to detect deviations that may affect the accuracy of an analysis. One way to do this is by generating a quality control chart, which...
1.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Kaminari: a frugal colored index for approximate <i>k</i>-mer queries.

Bioinformatics advances·2026
Same author

Predicting VNN resistance in European sea bass using machine learning on high dimensional low sample size data.

Frontiers in bioinformatics·2026
Same author

Fast Hashing of Spaced Seeds with DuoHash.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same author

USTAR-CR: Efficient and Compact Compression of <i>k</i>-Mer Sets Through Colored de Bruijn Graphs.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same author

Kaminari: a resource-frugal index for approximate colored <i>k</i>-mer queries.

bioRxiv : the preprint server for biology·2025
Same author

MISSH: Fast Hashing of Multiple Spaced Seeds.

IEEE/ACM transactions on computational biology and bioinformatics·2024
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Jan 3, 2026

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

9.6K

Better quality score compression through sequence-based quality smoothing.

Yoshihiro Shibuya1,2, Matteo Comin3

  • 1Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.

BMC Bioinformatics
|November 24, 2019
PubMed
Summary
This summary is machine-generated.

Genomic data compression is essential due to rising next-generation sequencing (NGS) costs. YALFF (Yet Another Lossy Fastq Filter) compresses quality scores, improving FASTQ file compressibility and genotyping accuracy while using minimal RAM.

Keywords:
BWTFASTQ compressionFM-Index

More Related Videos

RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
18:30

RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells

Published on: February 13, 2013

22.3K
Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

3.8K

Related Experiment Videos

Last Updated: Jan 3, 2026

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

9.6K
RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
18:30

RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells

Published on: February 13, 2013

22.3K
Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

3.8K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Next-generation sequencing (NGS) generates vast amounts of data, necessitating efficient storage solutions.
  • Quality scores in NGS data contribute significantly to file entropy and present a target for compression.
  • Existing tools attempt to smooth quality scores to enhance compressibility without compromising downstream analysis accuracy.

Purpose of the Study:

  • To develop a novel tool, YALFF (Yet Another Lossy Fastq Filter), for compressing quality scores in FASTQ files.
  • To improve the compressibility of genomic data by smoothing quality values.
  • To maintain high precision in SNP calling and genotyping accuracy.

Main Methods:

  • Utilizing the FM-Index, a compressed suffix array, for efficient k-mer dictionary storage.
  • Implementing an effective smoothing algorithm to reduce quality score entropy.
  • Developing YALFF to process FASTQ files on consumer-grade hardware with limited RAM.

Main Results:

  • YALFF significantly improves FASTQ file compressibility through quality score smoothing.
  • The tool requires minimal computational resources, running effectively on systems with 5.7 GB of free RAM.
  • The smoothing algorithm enhances genotyping accuracy in SNP calling pipelines.

Conclusions:

  • YALFF offers an efficient method for compressing NGS data by targeting quality scores.
  • The tool provides a practical solution for managing large genomic datasets on accessible hardware.
  • YALFF demonstrates the potential to improve both data compression and analytical accuracy in genomics.