Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Rare Coding Variants Reveal Distinct Genetic Architectures Across Multidimensional Sleep Phenotypes.

medRxiv : the preprint server for health sciences·2026
Same author

Alternate RNA decoding results in stable and abundant proteins in mammals.

Nature·2026
Same author

Mechanism of age-related accumulation of mtDNA mutations in human blood.

Nature·2026
Same author

Genomic signatures of selection in <i>Anopheles funestus</i> reveal shared and population-specific adaptive variation across African populations.

bioRxiv : the preprint server for biology·2026
Same author

Genetic prediction with ARG-powered linear algebra.

Genetics·2026
Same author

Topological stratification of continuous genetic variation in large biobanks.

PLoS genetics·2026
Same journal

NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing.

GigaScience·2026
Same journal

From the Brain Cell Atlas to Precision Neurology: A review of the application of AI-driven multi-omics in brain science.

GigaScience·2026
Same journal

Comparison of Deep Learning Approaches for Extreme Low-SNR Image Restoration.

GigaScience·2026
Same journal

ScopeViewer: A Browser-Based Solution for Visualizing Large Biological Images.

GigaScience·2026
Same journal

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026
Same journal

ClusterGraph: a new tool for visualisation and compression of multidimensional data.

GigaScience·2026
See all related articles

Related Experiment Video

Updated: Jun 12, 2025

Sample Preparation and Analysis of RNASeq-based Gene Expression Data from Zebrafish
11:42

Sample Preparation and Analysis of RNASeq-based Gene Expression Data from Zebrafish

Published on: October 27, 2017

10.8K

Analysis-ready VCF at Biobank scale using Zarr.

Eric Czech1,2, Will Tyler3, Tom White4

  • 1Open Athena AI Foundation, 1245 Broadway, 16th Floor, New York, NY 10001, USA.

Gigascience
|June 1, 2025
PubMed
Summary
This summary is machine-generated.

The Variant Call Format (VCF) Zarr specification offers a scalable solution for genetic variation data storage. This new format significantly improves efficiency and reduces costs for large-scale biobank datasets.

Keywords:
Variant Call FormatZarranalysis-ready data

More Related Videos

An Analytical Tool-box for Comprehensive Biochemical, Structural and Transcriptome Evaluation of Oral Biofilms Mediated by Mutans Streptococci
11:09

An Analytical Tool-box for Comprehensive Biochemical, Structural and Transcriptome Evaluation of Oral Biofilms Mediated by Mutans Streptococci

Published on: January 25, 2011

17.7K
Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

33.7K

Related Experiment Videos

Last Updated: Jun 12, 2025

Sample Preparation and Analysis of RNASeq-based Gene Expression Data from Zebrafish
11:42

Sample Preparation and Analysis of RNASeq-based Gene Expression Data from Zebrafish

Published on: October 27, 2017

10.8K
An Analytical Tool-box for Comprehensive Biochemical, Structural and Transcriptome Evaluation of Oral Biofilms Mediated by Mutans Streptococci
11:09

An Analytical Tool-box for Comprehensive Biochemical, Structural and Transcriptome Evaluation of Oral Biofilms Mediated by Mutans Streptococci

Published on: January 25, 2011

17.7K
Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

33.7K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Variant Call Format (VCF) is the standard for genetic variation data but is inefficient for large-scale biobanks.
  • Row-wise encoding of VCF is unsuitable for hundreds of terabytes of genomic data.
  • A more scalable approach is needed to handle increasing dataset sizes.

Purpose of the Study:

  • To introduce the VCF Zarr specification for efficient storage and processing of genetic variation data.
  • To provide software infrastructure for large-scale conversion to the VCF Zarr format.
  • To demonstrate the performance benefits of VCF Zarr over traditional VCF approaches.

Main Methods:

  • Encoding the VCF data model using the Zarr format for multidimensional data storage.
  • Developing software for efficient and reliable large-scale data conversion.
  • Benchmarking VCF Zarr against standard VCF and specialized methods using large human and non-human genomic datasets.

Main Results:

  • VCF Zarr demonstrates significantly higher efficiency compared to standard VCF.
  • Compression ratios and single-threaded performance are competitive with specialized genotype storage methods.
  • Case studies on large human datasets (Genomics England, Our Future Health, All of Us) and whole-genome datasets (Norway Spruce, SARS-CoV-2) show promising results.
  • Illustrative examples highlight the potential for cloud computing and GPU acceleration with VCF Zarr.

Conclusions:

  • Large row-encoded VCF files present a significant bottleneck and cost in current research.
  • The VCF Zarr specification, based on open-source technologies, can substantially reduce storage and processing costs.
  • VCF Zarr has the potential to foster a new ecosystem of cloud-native tools for genetic variation analysis while maintaining compatibility with existing workflows.