Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Mutation, Gene Flow, and Genetic Drift01:09

Mutation, Gene Flow, and Genetic Drift

59.5K
In a population that is not at Hardy-Weinberg equilibrium, the frequency of alleles changes over time. Therefore, any deviations from the five conditions of Hardy-Weinberg equilibrium can alter the genetic variation of a given population. Conditions that change the genetic variability of a population include mutations, natural selection, non-random mating, gene flow, and genetic drift (small population size).
59.5K
Mismatch Repair01:20

Mismatch Repair

5.2K
Organisms are capable of detecting and fixing nucleotide mismatches that occur during DNA replication. This sophisticated process requires identifying the new strand and replacing the erroneous bases with correct nucleotides. Mismatch repair is coordinated by many proteins in both prokaryotes and eukaryotes.
The Mutator Protein Family Plays a Key Role in DNA Mismatch Repair
The human genome has more than 3 billion base pairs of DNA per cell. Prior to cell division, that vast amount of genetic...
5.2K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

14.3K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
14.3K
Genomics02:02

Genomics

37.5K
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
37.5K
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

6.2K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
6.2K
Genetic Material01:20

Genetic Material

2.1K
Within the human body, a complex and detailed system of trillions of cells works in unison to sustain life. Each cell houses a nucleus, which contains 46 chromosomes divided into 23 pairs. Chromosomes are highly coiled structures made of the genetic material DNA. These chromosomes are essential carriers of genetic information, with half inherited from the mother through her egg and the other half from the father's sperm, combining to create the unique genetic makeup of an individual.
2.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Waldenström macroglobulinemia-associated renal AL amyloidosis: a case report and literature review.

Frontiers in immunology·2026
Same author

Bioinformatics, expression, and immunogenicity analysis of MPXV A7 protein.

Protein expression and purification·2026
Same author

Porcine MutBERT: a family of lightweight genomic foundation models for functional element prediction in pigs.

Briefings in bioinformatics·2026
Same author

Selective 1,2-addition of acetonitrile/acetone to α,β-unsaturated aldehydes at room temperature: access to cinnamyl alcohol derivatives with potential protective effects against exercise-induced skeletal muscle injury.

RSC advances·2026
Same author

Clinicopathological characteristics of a rare group of chronic active Epstein-Barr virus disease involving the gastrointestinal tract.

Journal of hematopathology·2026
Same author

Theoretical model and practical application of calibrating cloth method for measuring vehicle speed status.

Traffic injury prevention·2026
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Sep 15, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.3K

MutBERT: probabilistic genome representation improves genomics foundation models.

Weicai Long1, Houcheng Su1, Jiaqi Xiong1

  • 1Data Science and Analytics Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511453, China.

Bioinformatics (Oxford, England)
|July 15, 2025
PubMed
Summary
This summary is machine-generated.

MutBERT, a novel genomic foundation model, efficiently captures human genetic variations like single nucleotide polymorphisms (SNPs). This approach improves the analysis of large-scale genomic data for understanding human diversity and disease.

More Related Videos

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation
07:15

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

11.1K
Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information
09:37

Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information

Published on: August 15, 2019

9.9K

Related Experiment Videos

Last Updated: Sep 15, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.3K
Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation
07:15

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

11.1K
Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information
09:37

Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information

Published on: August 15, 2019

9.9K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Understanding human genetic diversity and disease necessitates models that capture sequence variations, such as single nucleotide polymorphisms (SNPs).
  • Existing genomic foundation models struggle with the sparsity and redundancy of human population data, leading to inefficiencies in learning rare variations.
  • Current masked language models (MLMs) trained on whole-genome sequences may not efficiently learn SNP variations due to their rarity.

Purpose of the Study:

  • To develop a probabilistic genome-based masked language model, MutBERT, that efficiently utilizes single nucleotide polymorphism (SNP) information from population-scale genomic data.
  • To improve the computational efficiency and performance of genomic foundation models by focusing on informative genetic variations.
  • To enable better utilization of biobank-scale genomic data for building pretrained genomic foundation models.

Main Methods:

  • Developed MutBERT, a probabilistic genome-based masked language model.
  • Represented the entire genome as a probabilistic distribution over observed allele frequencies to focus on informative variations.
  • Evaluated MutBERT against DNABERT-2, Nucleotide Transformer, and modified MutBERT versions on downstream prediction tasks.

Main Results:

  • MutBERT demonstrated efficient utilization of SNP information from population-scale genomic data.
  • The novel representation strategy allowed MutBERT to focus on informative genomic variations while maintaining computational efficiency.
  • MutBERT consistently ranked as a top-performing model in downstream prediction tasks, outperforming existing models.

Conclusions:

  • MutBERT's novel representation strategy effectively utilizes SNP information for enhanced performance in genomic foundation models.
  • This approach enables more efficient and effective analysis of large-scale genomic datasets, including biobank data.
  • MutBERT represents a significant advancement in building pretrained genomic foundation models for understanding human diversity and disease.