Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genomics02:02

Genomics

41.8K
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
41.8K
Genomic Imprinting and Inheritance02:30

Genomic Imprinting and Inheritance

38.8K
Diploid organisms inherit genetic material through chromosomes from both parents. Copies of the same gene are known as alleles. In most cases, both alleles are simultaneously expressed and allow various cellular processes to function optimally. If one of the alleles is missing or mutated, the expression of the other allele can compensate; however, this is not true for all genes.
The expression of some genes depends on which parent passed the gene to the offspring, through a phenomenon known as...
38.8K
Incomplete Dominance01:43

Incomplete Dominance

32.7K
Gregor Mendel's work (1822 - 1884) was primarily focused on pea plants. Through his initial experiments, he determined that every gene in a diploid cell has two variants called alleles inherited from each parent. He suggested that amongst these two alleles, one allele is dominant in character and the other recessive. The combination of alleles determines the phenotype of a gene in an organism.
32.7K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

16.7K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
16.7K
Bias in Epidemiological Studies01:29

Bias in Epidemiological Studies

1.6K
Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:  
1.6K
Genetic Screens02:46

Genetic Screens

5.9K
Genetic screens are tools used to identify genes and mutations responsible for phenotypes of interest. Genetic screens help identify individuals or a group of people at risk of developing  genetic diseases and help them with early intervention, targeted therapy, and reproductive options.
Forward genetic screens
Forward or “classical” genetic screens involve creating random mutations in an organism’s DNA using radiation, mutagens, or insertion of additional bases, which...
5.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Uniform processing and analysis of IGVF massively parallel reporter assay data with MPRAsnakeflow.

bioRxiv : the preprint server for biology·2025
Same author

varCADD: large sets of standing genetic variation enable genome-wide pathogenicity prediction.

Genome medicine·2025
Same author

Massively parallel jumping assay decodes Alu retrotransposition activity.

Nature communications·2025
Same author

Using individual barcodes to increase quantification power of massively parallel reporter assays.

BMC bioinformatics·2025
Same author

Massively parallel characterization of transcriptional regulatory elements.

Nature·2025
Same author

cfDNA UniFlow: a unified preprocessing pipeline for cell-free DNA data from liquid biopsies.

GigaScience·2024
Same journal

Beyond housekeeping: snRNA diversity, regulation, and human disease.

Trends in genetics : TIG·2026
Same journal

Rethinking mitochondrial metabolism: Intraindividual variability meets population constraints.

Trends in genetics : TIG·2026
Same journal

A role for epigenetics in rapid adaptation.

Trends in genetics : TIG·2026
Same journal

The myth of asexual fungi.

Trends in genetics : TIG·2026
Same journal

Rethinking molecular evolution through protein language model embeddings.

Trends in genetics : TIG·2026
Same journal

Co-transcriptional splicing: Distinct phases, mutual benefits, and basis for nuclear architecture.

Trends in genetics : TIG·2026
See all related articles

Related Experiment Video

Updated: Apr 8, 2026

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

35.0K

Data biases in genomics.

Lusine Nazaretyan1, Martin Kircher2

  • 1Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin 10117, Germany.

Trends in Genetics : TIG
|April 7, 2026
PubMed
Summary
This summary is machine-generated.

Data biases in genomic research can compromise machine learning (ML) model performance. This review examines genomic data biases and their impact on ML algorithms, using examples from common databases.

Keywords:
data biasesgenomic datamachine learning issues

More Related Videos

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.8K
In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila
06:41

In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila

Published on: August 20, 2019

14.5K

Related Experiment Videos

Last Updated: Apr 8, 2026

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

35.0K
Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.8K
In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila
06:41

In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila

Published on: August 20, 2019

14.5K

Area of Science:

  • Genomics
  • Bioinformatics
  • Machine Learning

Background:

  • Genomic research generates vast datasets, necessitating machine learning (ML) approaches.
  • ML algorithms require high-quality, representative data, which is often challenging in genomics.
  • Data biases, including systematic errors and incomplete information, can significantly impact genomic data quality.

Purpose of the Study:

  • To review and categorize data biases prevalent in genomic research.
  • To frame these biases within the context of general machine learning principles.
  • To illustrate the impact of data biases on ML model performance in genomic studies.

Main Methods:

  • Literature review of data biases in genomics.
  • Categorization of biases relevant to machine learning frameworks.
  • Analysis of biases in widely used genomic databases (e.g., NCBI ClinVar, gnomAD).

Main Results:

  • Identification and classification of various data bias types in genomic datasets.
  • Demonstration of how biases in databases like ClinVar and gnomAD can affect ML models.
  • Examples illustrating the influence of data biases on the performance of ML models in genomic research.

Conclusions:

  • Data biases are a critical challenge in applying machine learning to genomic data.
  • Understanding and mitigating these biases is essential for reliable genomic data analysis.
  • Awareness of biases in common databases is crucial for researchers using ML in genomics.