Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genetic Lingo

Genetic Lingo

Organization of Genes

Organization of Genes

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

Organization of Genes

Organization of Genes

Export of Mitochondrial and Chloroplast Genes

Export of Mitochondrial and Chloroplast Genes

A eukaryotic cell can have up to three different types of genetic systems: nuclear, mitochondrial, and chloroplast. During evolution, organelles have exported many genes to the nucleus; this transfer is still ongoing in some plant species. Approximately 18% of the Arabidopsis thaliana nuclear genome is thought to be derived from the chloroplast’s cyanobacterial ancestor, and around 75% of the yeast genome derived from the mitochondria’s bacterial ancestor. This export has occurred irrespective...

Genome-wide Association Studies-GWAS

Genome-wide Association Studies-GWAS

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Fast phenotype simulation for genotype representation graphs.

Bioinformatics advances·2026

Same author

IGD: a simple, efficient genotype data format.

Bioinformatics advances·2025

Same author

Signatures of selective sweeps in continuous-space populations.

Genetics·2025

Same author

Enabling efficient analysis of biobank-scale data with genotype representation graphs.

Nature computational science·2024

Same author

Signatures of selective sweeps in continuous-space populations.

bioRxiv : the preprint server for biology·2024

Same author

The lingering effects of Neanderthal introgression on human complex traits.

eLife·2023

Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026

Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026

Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026

Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026

Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026

Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 11, 2026

Infinium Assay for Large-scale SNP Genotyping Applications

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

IGD: A simple, efficient genotype data format.

Drew DeHaas¹, Xinzhu Wei¹

¹Department of Computational Biology, Cornell University, Ithaca, NY.

Biorxiv : the Preprint Server for Biology

|February 20, 2025

Summary

This summary is machine-generated.

We introduce the Indexable Genotype Data (IGD) file format, a simple binary format for genotype data. IGD offers significant speed and size improvements over existing formats for large-scale bioinformatics research.

More Related Videos

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Related Experiment Videos

Last Updated: May 11, 2026

Infinium Assay for Large-scale SNP Genotyping Applications

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Existing file formats for reference-sequence-aligned genotype data are often complex and inefficient.
Limited programming language support hinders the use of current genotype data formats.
There is a need for a simple, fast, and small file format for highly scalable bioinformatics research.

Purpose of the Study:

To present a novel, simple, and efficient file format for storing reference-sequence-aligned genotype data.
To demonstrate the performance benefits of the new format for large-scale genomic datasets.
To provide accessible implementations for reading and writing the new format.

Main Methods:

Development of the Indexable Genotype Data (IGD) file format, a simple uncompressed binary format.
Implementation of Python and C++ libraries for reading and writing IGD files.
Creation of tools for converting existing .vcf.gz files to the IGD format.

Main Results:

The IGD format is over 100 times faster and 3.5 times smaller than .vcf.gz for Biobank-scale whole-genome sequence data.
The Python implementation for IGD is concise, requiring under 350 lines of code.
Open-source C++ and Python libraries, along with conversion tools, are available.

Conclusions:

The Indexable Genotype Data (IGD) format offers a significant improvement in efficiency and simplicity for storing large-scale genotype data.
IGD facilitates faster and more scalable bioinformatics research by overcoming limitations of existing formats.
The availability of user-friendly libraries promotes the adoption and utility of the IGD format in the scientific community.