Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

What is Population Genetics?01:25

What is Population Genetics?

63.4K
A population is composed of members of the same species that simultaneously live and interact in the same area. When individuals in a population breed, they pass down their genes to their offspring. Many of these genes are polymorphic, meaning that they occur in multiple variants. Such variations of a gene are referred to as alleles. The collective set of all the alleles within a population is known as the gene pool.
63.4K
Genetic Variation01:25

Genetic Variation

1.0K
Genetic variation is the diversity in DNA sequences found among individuals of the same species. This diversity is crucial for a species' survival because it helps organisms adapt to environmental changes. Genetic variation begins with fertilization, where an egg and sperm cell merge. Each of these cells carries 23 chromosomes, up to 46 in the fertilized egg. Chromosomes are long DNA strands that contain genes, the basic units of heredity.
Genes exist in different versions called alleles,...
1.0K
Genomics02:02

Genomics

38.9K
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
38.9K
Genetic Material01:20

Genetic Material

3.0K
Within the human body, a complex and detailed system of trillions of cells works in unison to sustain life. Each cell houses a nucleus, which contains 46 chromosomes divided into 23 pairs. Chromosomes are highly coiled structures made of the genetic material DNA. These chromosomes are essential carriers of genetic information, with half inherited from the mother through her egg and the other half from the father's sperm, combining to create the unique genetic makeup of an individual.
3.0K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

14.9K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
14.9K
Export of Mitochondrial and Chloroplast Genes02:19

Export of Mitochondrial and Chloroplast Genes

3.9K
A eukaryotic cell can have up to three different types of genetic systems: nuclear, mitochondrial, and chloroplast. During evolution, organelles have exported many genes to the nucleus; this transfer is still ongoing in some plant species. Approximately 18% of the Arabidopsis thaliana nuclear genome is thought to be derived from the chloroplast’s cyanobacterial ancestor, and around 75% of the yeast genome derived from the mitochondria’s bacterial ancestor. This export has occurred...
3.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Postoperative Delirium Following Continuous Fascia Iliaca Compartment Block in Elderly Patients with Hip Fracture: A Case Series and Literature Review.

Local and regional anesthesia·2026
Same author

Enabling and Enhancing Massive Multiple Input-Multiple Output Systems with Two-Dimensional Orthogonal Pattern Division Multiple Access.

Sensors (Basel, Switzerland)·2026
Same author

Identification and transcriptome analysis of a major locus for eye depth in tetraploid potato Jinshu 16.

BMC plant biology·2026
Same author

Preparation and Application of Cellulose-Based Thermosensitive Polymer in Water-Based Drilling Fluid.

Polymers·2026
Same author

Population-scale repeat expansions elucidate disease risk and brain atrophy.

Nature·2026
Same author

Perioperative Bleeding Risk and Associated Factors in End-Stage Kidney Disease Patients Undergoing Intertrochanteric Fracture Surgery: Implications for Management.

Orthopedic research and reviews·2026
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Nov 26, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.6K

Sparse Project VCF: efficient encoding of population genotype matrices.

Michael F Lin1, Xiaodong Bai2, William J Salerno2

  • 1mlin.net LLC, San Jose, CA 95113, USA.

Bioinformatics (Oxford, England)
|December 10, 2020
PubMed
Summary
This summary is machine-generated.

Sparse Project VCF (spVCF) reduces Variant Call Format (VCF) file sizes by over 10× for large cohort sequencing. This efficient compression method minimizes data loss and maintains interoperability with existing VCF tools.

More Related Videos

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.4K
Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine
10:40

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Published on: December 22, 2017

10.7K

Related Experiment Videos

Last Updated: Nov 26, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.6K
Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.4K
Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine
10:40

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Published on: December 22, 2017

10.7K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Variant Call Format (VCF) is the standard for germline genotype representation.
  • VCF file sizes increase rapidly with larger cohorts and more discovered rare variants.

Purpose of the Study:

  • To introduce Sparse Project VCF (spVCF) as an efficient evolution of VCF.
  • To achieve significant size reduction for VCF files with minimal information loss.

Main Methods:

  • Developed spVCF using entropy reduction and run-length encoding.
  • Ensured spVCF interoperates efficiently with standard VCF, including tabix-based random access.

Main Results:

  • Achieved >10× size reduction in VCF files.
  • Demonstrated effectiveness on large datasets like DiscovEHR and UK Biobank whole-exome sequencing cohorts.
  • Maintained practically minimal information loss.

Conclusions:

  • spVCF offers a highly effective solution for managing large genomic datasets.
  • The developed method addresses the growing challenge of VCF file size in population sequencing studies.