Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Complementary DNA01:44

Complementary DNA

31.1K
Overview
31.1K
Genome Copying Errors02:46

Genome Copying Errors

5.0K
DNA replication is a well-evolved process that copies millions of base pairs with high fidelity during each cell division. Occasionally a wrong base or a long stretch of wrong bases may get added to the daughter strands. If the errors are left unchecked, cells might accumulate several mutations that might endanger their  survival. Therefore, the copying errors are checked and repaired at three levels.
5.0K
Genomics02:02

Genomics

39.5K
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
39.5K
Gene Conversion02:08

Gene Conversion

2.8K
2.8K
Gene Conversion02:08

Gene Conversion

10.5K
Other than maintaining genome stability via DNA repair, homologous recombination plays an important role in diversifying the genome. In fact, the recombination of sequences forms the molecular basis of genomic evolution. Random and non-random permutations of genomic sequences create a library of new amalgamated sequences. These newly formed genomes can determine the fitness and survival of cells. In bacteria, homologous and non-homologous types of recombination lead to the evolution of new...
10.5K
Next-generation Sequencing03:00

Next-generation Sequencing

97.4K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
97.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Additive-driven microwave crystallization of tyramine polymorphs and salts: a quantum crystallography perspective. Corrigendum.

IUCrJ·2026
Same author

FFC: a scalable FASTA compressor.

Bioinformatics (Oxford, England)·2026
Same author

MBGC2: Boosting compression via efficient encoding of approximate matches in genome collections.

GigaScience·2026
Same author

The whole is greater than the sum of its parts: binary and ternary 5-fluorouracil co-crystals with enhanced selectivity towards metastatic cancer cells.

Chemical communications (Cambridge, England)·2025
Same author

Additive-driven microwave crystallization of tyramine polymorphs and salts: a quantum crystallography perspective.

IUCrJ·2025
Same author

PgRC2: engineering the compression of sequencing reads.

Bioinformatics (Oxford, England)·2025
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Dec 31, 2025

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies
12:08

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

5.7K

PgRC: pseudogenome-based read compressor.

Tomasz M Kowalski1, Szymon Grabowski1

  • 1Institute of Applied Computer Science, Lodz University of Technology, Lodz 90-924, Poland.

Bioinformatics (Oxford, England)
|January 2, 2020
PubMed
Summary
This summary is machine-generated.

Pseudogenome-based Read Compressor (PgRC) offers improved compression for massive sequencing data. This DNA compression tool achieves superior compression ratios compared to existing methods, efficiently managing large biological datasets.

More Related Videos

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome
06:40

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

6.1K
De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data
08:23

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

Published on: February 18, 2022

4.1K

Related Experiment Videos

Last Updated: Dec 31, 2025

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies
12:08

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

5.7K
G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome
06:40

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

6.1K
De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data
08:23

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

Published on: February 18, 2022

4.1K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • High-throughput sequencing generates vast amounts of data, exceeding Moore's Law predictions.
  • Efficient storage and transmission of this genomic data are critical challenges.
  • Existing FASTQ compressors have limitations in compression ratio and decompression resources.

Purpose of the Study:

  • To introduce a novel algorithm for compressing DNA sequencing data.
  • To address the limitations of current data compression methods in genomics.

Main Methods:

  • Development of Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm.
  • Utilizing an approximation of the shortest common superstring for high-quality reads.
  • Comparative performance analysis against SPRING and Minicom.

Main Results:

  • PgRC achieves superior compression ratios, outperforming SPRING by up to 15% and Minicom by up to 20% on average.
  • The algorithm demonstrates comparable decompression speeds to existing methods.
  • Successful implementation as an in-memory algorithm for DNA stream compression.

Conclusions:

  • PgRC offers a significant advancement in compressing large-scale sequencing data.
  • The method provides a more efficient solution for managing the growing volume of genomic information.
  • PgRC presents a viable alternative for researchers dealing with substantial biological datasets.