Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

5.7K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
5.7K
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

18.8K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
18.8K
DNA as a Genetic Template02:05

DNA as a Genetic Template

21.6K
Two structural features of the DNA molecule provide a basis for the mechanisms of heredity: the four nucleotide bases and its double-stranded nature. The Watson-Crick model of double-helical DNA structure, proposed in 1952, drew heavily upon the X-ray crystallography work of researchers Rosalind Franklin and Maurice Wilkins. Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine for their work in 1962. Franklin was, controversially, excluded from the prize for...
21.6K
Phylogenetic Trees03:21

Phylogenetic Trees

45.1K
Phylogenetic trees come in many forms. It matters in which sequence the organisms are arranged from the bottom to the top of the tree, but the branches can rotate at their nodes without altering the information. The lines connecting individual nodes can be straight, angled, or even curved.
45.1K
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

3.9K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
3.9K
Next-generation Sequencing03:00

Next-generation Sequencing

87.2K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
87.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Quantitative Assessment of Brain Glucose Metabolism Using Dynamic Glucose-Enhanced Magnetic Resonance Fingerprinting (DGE-MRF).

Chemical & biomedical imaging·2026
Same author

Weak solutions to the Bloch equations with distant dipolar field.

The Journal of chemical physics·2026
Same author

Multi-tissue transcriptomic aging atlas reveals predictive aging biomarkers in the killifish.

Nature aging·2026
Same author

Admissibility of solitary wave modes in long-runout debris flows.

Physical review. E·2025
Same author

Toward a formulation of a CISS theory with the inclusion of two-particle relativistic effects, electron-phonon coupling, and electron-electron correlation. An application to NMR-based chiral discrimination.

The Journal of chemical physics·2025
Same author

Multivariate metal-organic frameworks enable chemical shift-encoded MRI with femtomolar sensitivity for biological systems.

Nature communications·2025
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: May 29, 2025

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.3K

Embed-Search-Align: DNA sequence alignment using Transformer models.

Pavan Holur1, K C Enevoldsen2,3, Shreyas Rajesh1

  • 1Department of Electrical and Computer Engineering, UCLA, Los Angeles, California, 90024, United States.

Bioinformatics (Oxford, England)
|February 6, 2025
PubMed
Summary
This summary is machine-generated.

We developed a novel DNA embedding framework (ESA) using a Reference-Free DNA Embedding (RDE) Transformer. This method accurately aligns DNA reads to reference genomes, matching conventional tools and outperforming other DNA-Transformer models.

More Related Videos

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

15.8K
A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.5K

Related Experiment Videos

Last Updated: May 29, 2025

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.3K
Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

15.8K
A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.5K

Area of Science:

  • Genomics
  • Bioinformatics
  • Machine Learning

Background:

  • DNA sequence alignment is crucial for genomics, traditionally involving genome indexing and read searching.
  • Large Language Models (LLMs) show promise in encoding DNA sequences, but performance in classification doesn't guarantee genome-wide alignment.
  • Existing methods face challenges in efficiently searching extensive reference genomes for short DNA reads.

Purpose of the Study:

  • To bridge the gap between DNA sequence classification and genome-wide alignment using LLMs.
  • To develop a novel framework for accurate and efficient DNA sequence alignment.
  • To introduce a reference-free DNA embedding model capable of genome-scale search.

Main Methods:

  • Developed the "Embed-Search-Align" (ESA) framework incorporating a Reference-Free DNA Embedding (RDE) Transformer model.
  • Utilized contrastive loss for self-supervised training to generate rich, reference-free DNA sequence embeddings.
  • Implemented a DNA vector store for efficient, global-scale search across reference genome fragments.

Main Results:

  • The RDE model achieved 99% accuracy in aligning 250-length reads to a 3-gigabase human reference genome.
  • The ESA framework's performance rivals established alignment tools like Bowtie and BWA-Mem.
  • RDE significantly outperformed recent DNA-Transformer baselines (e.g., Nucleotide Transformer, Hyena-DNA) and demonstrated cross-species and cross-chromosome transferability.

Conclusions:

  • The ESA framework and RDE model offer a powerful new approach to DNA sequence alignment.
  • This method provides a viable, high-accuracy alternative to conventional algorithmic alignment techniques.
  • The reference-free embedding strategy shows potential for advancing genomic analysis and large-scale sequence comparison.