Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
Gene Evolution - Fast or Slow?02:05

Gene Evolution - Fast or Slow?

The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code...
Gene Evolution - Fast or Slow?02:05

Gene Evolution - Fast or Slow?

The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code...
DNA as a Genetic Template02:05

DNA as a Genetic Template

Two structural features of the DNA molecule provide a basis for the mechanisms of heredity: the four nucleotide bases and its double-stranded nature. The Watson-Crick model of double-helical DNA structure, proposed in 1952, drew heavily upon the X-ray crystallography work of researchers Rosalind Franklin and Maurice Wilkins. Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine for their work in 1962. Franklin was, controversially, excluded from the prize for...
DNA as a Genetic Template02:05

DNA as a Genetic Template

Two structural features of the DNA molecule provide a basis for the mechanisms of heredity: the four nucleotide bases and its double-stranded nature. The Watson-Crick model of double-helical DNA structure, proposed in 1952, drew heavily upon the X-ray crystallography work of researchers Rosalind Franklin and Maurice Wilkins. Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine for their work in 1962. Franklin was, controversially, excluded from the prize for...
Genomics02:02

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Sequence Display enables large-scale sequence-activity datasets for rapid protein evolution.

Nature biotechnology·2026
Same author

Biocatalytic Synthesis of N-Protected α-Amino Acids through 1,3-Nitrogen Migration by Nonheme Iron Enzymes.

Journal of the American Chemical Society·2025
Same author

Engineering unnatural cells with a 21st amino acid as a living epigenetic sensor.

Nature communications·2025
Same author

Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input.

bioRxiv : the preprint server for biology·2025
Same author

A trimodal protein language model enables advanced protein searches.

Nature biotechnology·2025
Same author

Identifying perturbations that boost T-cell infiltration into tumours via counterfactual learning of their spatial proteomic profiles.

Nature biomedical engineering·2025
Same journal

A Multitask Prediction Framework for CircRNAs, Drugs, and Diseases Based on Multi-View Information Integration and Graph Contrastive Learning.

ACS synthetic biology·2026
Same journal

Engineering Modular Cargo Loading Strategies for Carboxysome-Derived Protein Particles.

ACS synthetic biology·2026
Same journal

Suppression of Salmonella Effectors with CRISPRi Controls the Immune Response to Bacterial Therapies.

ACS synthetic biology·2026
Same journal

Rational Design of Linalool Dehydratase-Isomerase Enables Efficient Conversion of Phytol to Neophytadiene.

ACS synthetic biology·2026
Same journal

<i>De Novo</i> Biosynthesis of Polyphyllin V in <i>Nicotiana benthamiana</i> through Pathway Reconstruction and UDP-Sugar Engineering.

ACS synthetic biology·2026
Same journal

Rapid and Continuous Directed Evolution in <i>Vibrio natriegens</i> Utilizing an <i>In Vivo</i> Hypermutation System.

ACS synthetic biology·2026
See all related articles

Related Experiment Video

Updated: Jun 10, 2026

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

Evaluating DNA Function Understanding in Genomic Language Models Using Evolutionarily Implausible Sequences.

Shiyu Jiang1, Xuyin Liu1, Zitong Jerry Wang1

  • 1Center for Interdisciplinary Studies, School of Science, Westlake University, Hangzhou 310030, China.

ACS Synthetic Biology
|June 9, 2026
PubMed
Summary
This summary is machine-generated.

Genomic language models (gLMs) struggle to predict functional DNA sequences, failing to identify loss-of-function mutations. Their performance indicates a reliance on memorized patterns rather than true understanding of gene expression mechanisms.

Keywords:
gene expressiongenomic language modelmechanistic understandingmutation predictionregulatory genomicssynthetic biology

More Related Videos

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

Related Experiment Videos

Last Updated: Jun 10, 2026

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

Area of Science:

  • Synthetic Biology
  • Computational Biology
  • Genomics

Background:

  • Genomic language models (gLMs) show potential for designing functional DNA sequences.
  • A key challenge is distinguishing genuine functional understanding from pattern memorization in gLMs.
  • Evaluating gLM generalization to novel, engineered sequences is crucial.

Purpose of the Study:

  • Introduce Nullsettes, a novel framework to evaluate gLM prediction of loss-of-function (LOF) mutations.
  • Assess whether gLMs understand sequence function or rely on training data patterns.
  • Test gLM performance on synthetic expression cassettes without evolutionary history.

Main Methods:

  • Developed the Nullsettes evaluation framework.
  • Tested state-of-the-art gLMs on predicting LOF mutations in synthetic DNA.
  • Analyzed model performance based on sequence likelihood and mutation impact.

Main Results:

  • gLMs consistently failed to detect strong LOF mutations in synthetic constructs.
  • Predictive accuracy decreased significantly when non-mutant sequences had lower model likelihood.
  • Results suggest gLMs primarily match evolutionary patterns, not mechanistic gene expression.

Conclusions:

  • Current gLMs exhibit significant limitations in generalizing to engineered genetic sequences.
  • There is a critical need for evaluation methods that test for functional understanding.
  • Future gLM development must prioritize mechanistic insights over pattern matching for reliable synthetic biology applications.