Sequence modeling and design from molecular to genome scale with Evo

Affiliations
  • 1Arc Institute, Palo Alto, CA, USA.
  • 2Department of Bioengineering, Stanford University, Stanford, CA, USA.
  • 3Department of Computer Science, Stanford University, Stanford, CA, USA.
  • 4TogetherAI, San Francisco, CA, USA.
  • 5Stanford Data Science, Stanford University, Stanford, CA, USA.
  • 6Department of Genetics, Stanford University, Stanford, CA, USA.
  • 7Stanford Center for Biomedical Informatics Research, Stanford, CA, USA.
  • 8Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
  • 9CZ Biohub, San Francisco, CA, USA.
  • 10Department of Neurobiology, Stanford University, Stanford, CA, USA.
  • 11Department of Bioengineering and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
  • 12Department of Chemical Engineering, Stanford University, Stanford, CA, USA.

Published on:

Abstract

The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism’s function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale sequences with plausible genomic architecture. These prediction and generation capabilities span molecular to genomic scales of complexity, advancing our understanding and control of biology.

Related Concept Videos

JoVE Research Video for Evolutionary Relationships through Genome Comparisons 02:54

5.5K

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse…

JoVE Research Video for Molecular Models 02:00

35.9K

Physical models representing molecular architectures of chemical compounds play essential roles in understanding chemistry. The use of molecular models makes it easier to visualize the structures and shapes of atoms and molecules.

Skeletal Model

Simpler two-dimensional representations of chemical compounds are accomplished using skeletal models. The illustration shows only the molecular framework or bonds without explicitly showing the atoms. In this representation, many of the carbon atoms…

JoVE Research Video for Synthetic Biology 02:55

4.6K

Synthetic biology is an interdisciplinary science that involves using principles from disciplines such as engineering, molecular biology, cell biology, and systems biology. It involves remodeling existing organisms from nature or constructing completely new synthetic organisms for applications such as protein or enzyme production, bioremediation, value-added macromolecule production, and the addition of desirable traits to crops, to name a few.
Golden rice
Golden rice is a genetically modified…

JoVE Research Video for Gene Evolution - Fast or Slow? 02:05

6.8K

The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code…

JoVE Research Video for Eukaryotic Evolution 01:24

19.0K

The endosymbiont theory is the most widely accepted theory of eukaryotic evolution; however, its progression is still somewhat debated. According to the nucleus-first hypothesis, the ancestral prokaryote first evolved a membrane to enclose DNA and form the nucleus. Conversely, the mitochondria-first hypothesis suggests that the nucleus was formed after endosymbiosis of mitochondria.
Contrary to the endosymbiont theory, the eukaryote-first hypothesis proposes that the simpler prokaryotic and…

JoVE Research Video for Genomics 02:02

34.4K

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and…