Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genetic Lingo01:11

Genetic Lingo

Overview
Organization of Genes02:07

Organization of Genes

Overview
Genomics02:02

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
Organization of Genes02:07

Organization of Genes

Overview
Export of Mitochondrial and Chloroplast Genes02:19

Export of Mitochondrial and Chloroplast Genes

A eukaryotic cell can have up to three different types of genetic systems: nuclear, mitochondrial, and chloroplast. During evolution, organelles have exported many genes to the nucleus; this transfer is still ongoing in some plant species. Approximately 18% of the Arabidopsis thaliana nuclear genome is thought to be derived from the chloroplast’s cyanobacterial ancestor, and around 75% of the yeast genome derived from the mitochondria’s bacterial ancestor. This export has occurred irrespective...
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Fast phenotype simulation for genotype representation graphs.

Bioinformatics advances·2026
Same author

IGD: a simple, efficient genotype data format.

Bioinformatics advances·2025
Same author

Signatures of selective sweeps in continuous-space populations.

Genetics·2025
Same author

Enabling efficient analysis of biobank-scale data with genotype representation graphs.

Nature computational science·2024
Same author

Signatures of selective sweeps in continuous-space populations.

bioRxiv : the preprint server for biology·2024
Same author

The lingering effects of Neanderthal introgression on human complex traits.

eLife·2023
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: May 11, 2026

Infinium Assay for Large-scale SNP Genotyping Applications
13:33

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

38.8K

IGD: A simple, efficient genotype data format.

Drew DeHaas1, Xinzhu Wei1

  • 1Department of Computational Biology, Cornell University, Ithaca, NY.

Biorxiv : the Preprint Server for Biology
|February 20, 2025
PubMed
Summary
This summary is machine-generated.

We introduce the Indexable Genotype Data (IGD) file format, a simple binary format for genotype data. IGD offers significant speed and size improvements over existing formats for large-scale bioinformatics research.

More Related Videos

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.1K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.0K

Related Experiment Videos

Last Updated: May 11, 2026

Infinium Assay for Large-scale SNP Genotyping Applications
13:33

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

38.8K
Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.1K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.0K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Existing file formats for reference-sequence-aligned genotype data are often complex and inefficient.
  • Limited programming language support hinders the use of current genotype data formats.
  • There is a need for a simple, fast, and small file format for highly scalable bioinformatics research.

Purpose of the Study:

  • To present a novel, simple, and efficient file format for storing reference-sequence-aligned genotype data.
  • To demonstrate the performance benefits of the new format for large-scale genomic datasets.
  • To provide accessible implementations for reading and writing the new format.

Main Methods:

  • Development of the Indexable Genotype Data (IGD) file format, a simple uncompressed binary format.
  • Implementation of Python and C++ libraries for reading and writing IGD files.
  • Creation of tools for converting existing .vcf.gz files to the IGD format.

Main Results:

  • The IGD format is over 100 times faster and 3.5 times smaller than .vcf.gz for Biobank-scale whole-genome sequence data.
  • The Python implementation for IGD is concise, requiring under 350 lines of code.
  • Open-source C++ and Python libraries, along with conversion tools, are available.

Conclusions:

  • The Indexable Genotype Data (IGD) format offers a significant improvement in efficiency and simplicity for storing large-scale genotype data.
  • IGD facilitates faster and more scalable bioinformatics research by overcoming limitations of existing formats.
  • The availability of user-friendly libraries promotes the adoption and utility of the IGD format in the scientific community.