Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

5.7K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
5.7K
Next-generation Sequencing03:00

Next-generation Sequencing

88.5K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
88.5K
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

11.1K
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
11.1K
DNA as a Genetic Template02:05

DNA as a Genetic Template

21.8K
Two structural features of the DNA molecule provide a basis for the mechanisms of heredity: the four nucleotide bases and its double-stranded nature. The Watson-Crick model of double-helical DNA structure, proposed in 1952, drew heavily upon the X-ray crystallography work of researchers Rosalind Franklin and Maurice Wilkins. Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine for their work in 1962. Franklin was, controversially, excluded from the prize for...
21.8K
Genomics02:02

Genomics

36.2K
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
36.2K
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

18.8K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
18.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026
Same author

Implementing intervention mapping to tailor the evidence-based PREDIMED intervention for medically underserved prostate cancer patients.

Translational behavioral medicine·2026
Same author

Cellular stemness identifies high-risk ductal carcinoma <i>in situ</i> and offers a therapeutic interception opportunity.

bioRxiv : the preprint server for biology·2026
Same author

A high-penetrance intergenic variant at 9p21 confers melanoma susceptibility.

Research square·2026
Same author

Germline variants in cancer susceptibility genes among patients with mucosal melanoma.

NPJ genomic medicine·2026
Same author

Multi-organ imaging and genetics show the impact of sleep patterns on the human brain and body.

Communications medicine·2026
Same journal

A human-specific genetic modifier reconfigures large-scale cortical network dynamics underlying behavioral performance.

bioRxiv : the preprint server for biology·2026
Same journal

<i>Staphylococcus aureus</i> uses a eukaryotic-like uridyltransferase to make UDP-GlcNAc for cell wall synthesis.

bioRxiv : the preprint server for biology·2026
Same journal

Dynamic redistribution of eIF4F controls cap-dependent translation initiation.

bioRxiv : the preprint server for biology·2026
Same journal

When does additional information improve accuracy of RNA secondary structure prediction?

bioRxiv : the preprint server for biology·2026
Same journal

Normative brain-state trajectories reveal deviation from healthy aging in Alzheimer's disease.

bioRxiv : the preprint server for biology·2026
Same journal

Noradrenergic infraslow rhythm during sleep is the critical link between heart-rate dynamics and memory consolidation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: Jun 15, 2025

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
12:36

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA

Published on: May 9, 2011

10.2K

Benchmarking DNA Foundation Models for Genomic Sequence Classification.

Haonan Feng1, Lang Wu2, Bingxin Zhao3

  • 1Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.

Biorxiv : the Preprint Server for Biology
|August 26, 2024
PubMed
Summary
This summary is machine-generated.

Benchmarking DNA foundation language models reveals that using mean token embeddings significantly enhances performance and reduces model differences across genomic tasks. This approach offers a more reliable evaluation for these powerful genomics tools.

More Related Videos

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.3K
Chromatin Immunoprecipitation of Murine Brown Adipose Tissue
07:50

Chromatin Immunoprecipitation of Murine Brown Adipose Tissue

Published on: November 21, 2018

8.1K

Related Experiment Videos

Last Updated: Jun 15, 2025

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA
12:36

Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA

Published on: May 9, 2011

10.2K
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.3K
Chromatin Immunoprecipitation of Murine Brown Adipose Tissue
07:50

Chromatin Immunoprecipitation of Murine Brown Adipose Tissue

Published on: November 21, 2018

8.1K

Area of Science:

  • Genomics
  • Computational Biology
  • Bioinformatics

Background:

  • DNA foundation language models (FLMs) are advancing genomics by decoding DNA patterns.
  • Current evaluations using fine-tuning and limited data introduce bias and limit potential assessment.

Purpose of the Study:

  • To benchmark three recent DNA FLMs (DNABERT-2, NT-v2, HyenaDNA) using zero-shot embeddings.
  • To evaluate model performance across diverse genomic tasks and species using 57 real datasets.
  • To compare the effectiveness of mean token embedding versus sentence-level summary token embedding.

Main Methods:

  • Zero-shot embedding analysis of DNABERT-2, Nucleotide Transformer version-2 (NT-v2), and HyenaDNA.
  • Evaluation across 57 diverse genomic datasets and multiple species.
  • Comparative analysis of mean token embedding versus sentence-level summary token embedding strategies.

Main Results:

  • DNABERT-2 showed consistent performance on human genome tasks; NT-v2 excelled in epigenetic modification detection; HyenaDNA demonstrated scalability for long sequences.
  • Using mean token embedding improved Area Under the Curve (AUC) by 4.3%–9.7% for all models compared to default sentence-level embeddings.
  • Mean token embedding reduced performance disparities between the evaluated DNA foundation models.

Conclusions:

  • Mean token embedding is a superior strategy for evaluating DNA FLMs, enhancing performance and consistency.
  • The study provides a framework for selecting and optimizing DNA FLMs for genomic research.
  • Findings guide researchers in the effective application of advanced language models in genomics.