Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
Improving Translational Accuracy02:07

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
Per-Unit Sequence Models01:26

Per-Unit Sequence Models

An ideal Y-Y transformer, grounded through neutral impedances, displays per-unit sequence networks akin to those of a single-phase ideal transformer when subjected to balanced positive- or negative-sequence currents. These currents do not produce neutral currents, and their associated voltage drops.
Zero-sequence currents, which are identical in magnitude and phase, generate a neutral current, resulting in voltage drops across the neutral impedance and the low-voltage winding. If the...
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
Next-generation Sequencing03:00

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.
Genomics02:02

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Meta-analytic microbiome target discovery for immune checkpoint inhibitor response in advanced melanoma.

Communications medicine·2026
Same author

SARS-CoV-2 wastewater genomic surveillance: approaches, challenges, and opportunities.

Genome biology·2026
Same author

Single-blinded, stratified, dose ranging trial to assess pharmacokinetics and identify optimal dose of vitamin B12 in pregnancy in Tanzania.

Gates open research·2026
Same author

Plasma Branched-Chain Amino Acid and Cardiovascular Disease Risk Factors: A Longitudinal Analysis of a Lifestyle Trial.

The Journal of clinical endocrinology and metabolism·2025
Same author

<i>cellSight</i>: Characterizing dynamics of cells using single-cell RNA-sequencing.

bioRxiv : the preprint server for biology·2025
Same author

One-quarter of freshwater fauna threatened with extinction.

Nature·2025
Same journal

Phylogenomic blind spots: The limits of UCE and BUSCO loci in the presence of gene flow.

Molecular biology and evolution·2026
Same journal

The transcriptional and translational outcomes for pseudogenes in bacterial endosymbionts.

Molecular biology and evolution·2026
Same journal

800 million years of co-evolution in the green plant lineage - the case of LEUNIG and SEUSS transcriptional co-regulators.

Molecular biology and evolution·2026
Same journal

RNA i-motif landscapes in plant kingdom and their potential functional roles.

Molecular biology and evolution·2026
Same journal

Functional Divergence and Structural Changes of class IV Histone Deacetylases (HDACs) Across the Tree of Life.

Molecular biology and evolution·2026
Same journal

Cis-regulation of gene expression between sexes and life stages in Rumex hastatulus.

Molecular biology and evolution·2026
See all related articles

Related Experiment Video

Updated: Jun 26, 2026

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons
08:04

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons

Published on: June 6, 2025

seqLens: optimizing language models for genomic predictions.

Mahdi Baghbanzadeh1, Brendan Mann1, Keith A Crandall1

  • 1Department of Biostatistics and Bioinformatics, Computational Biology Institute, The George Washington University, Washington, DC 20052, USA.

Molecular Biology and Evolution
|June 24, 2026
PubMed
Summary
This summary is machine-generated.

Genomic language models (gLMs) can now understand evolutionary relationships in DNA. Optimizing tokenization and pretraining data significantly improves gLM performance for genomic feature identification and annotation.

More Related Videos

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets
03:37

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Published on: March 1, 2024

Related Experiment Videos

Last Updated: Jun 26, 2026

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons
08:04

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons

Published on: June 6, 2025

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets
03:37

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Published on: March 1, 2024

Area of Science:

  • Computational Biology
  • Genomics
  • Machine Learning
  • Evolutionary Biology

Background:

  • Language modeling offers a novel approach to understanding genomic sequence variation and evolutionary patterns.
  • Existing language models face computational challenges in tokenization and architecture for diverse genomic data across evolutionary scales.

Purpose of the Study:

  • To investigate key elements of genomic language modeling (gLM), including tokenization, pretraining datasets, and model architecture.
  • To apply and evaluate gLMs for identifying evolutionary genomic features and improving genome annotations.

Main Methods:

  • Gathered two distinct pretraining datasets comprising prokaryotic and eukaryotic reference genomes.
  • Trained five byte-pair encoding tokenizers and pretrained 52 gLMs, comparing various architectures and hyperparameters.
  • Introduced seqLens, a novel model architecture based on disentangled attention with relative positional encoding.

Main Results:

  • The seqLens model family demonstrated superior performance in 13 of 19 benchmarking phenotypic predictions compared to similar-sized models.
  • Relevant pretraining data significantly enhanced gLM performance, while larger tokenizer vocabularies negatively impacted generalization.
  • Alternative pooling techniques improved classification accuracy, and gLMs showed capability in discerning evolutionary relationships.

Conclusions:

  • Optimizing tokenization strategies and pretraining datasets is crucial for effective genomic language modeling.
  • The developed gLMs can successfully identify diverse evolutionary genomic features and aid in genome annotation.
  • Findings provide a foundation for advancing language models in evolutionary genomics research.