Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genetic Lingo01:11

Genetic Lingo

Overview
Genomics02:02

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
Leaky Scanning02:28

Leaky Scanning

During most eukaryotic translation processes, the small 40S ribosome subunit scans an mRNA from its 5' end until it encounters the first start AUG codon. The large 60S ribosomal subunit then joins the smaller one to initiate protein synthesis. The location of the translation initiation is largely determined by the nucleotides near the start codon as there may be multiple translation initiation sites present on the mRNA.  Marilyn Kozak discovered that the sequence RCCAUGG (where R stands for...
Sanger Sequencing01:57

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
Next-generation Sequencing03:00

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same authorSame journal

Orchestrator multi-agent clinical decision support system for secondary headache diagnosis in primary care.

Journal of the American Medical Informatics Association : JAMIA·2026
Same author

Structural characterization and functional evaluation of deer sinew peptide-calcium chelate for intestinal calcium transport and osteogenic differentiation.

International journal of biological macromolecules·2026
Same author

A universal foundation model for grounded biomedical image interpretation.

Nature communications·2026
Same author

An integrated single-cell and spatial proteotranscriptomics atlas of fibroblast-driven immunoregulation within the human adult oral cavity.

Cell press blue·2026
Same author

Rethinking radiology AI benchmarks.

Radiology advances·2026
Same author

CD38⁺ endothelial remodeling marks spatially patterned vasculopathy in rapidly advancing periodontitis and peri-implantitis.

Nature communications·2026
Same journal

Digital divide in clinical and operational artificial intelligence adoption and implementation stages: US hospital diffusion patterns and AI deserts.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Extending the fundamental theorem of biomedical informatics: a proposal and illustrative examples.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Human factors methods for designing safe health information technology: what do the experts think?

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Equity-by-design for socially assistive robots as digital health tools.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

CUI-Curate: a GraphRAG-based framework for automated clinical concept curation for NLP applications.

Journal of the American Medical Informatics Association : JAMIA·2026
See all related articles

Related Experiment Video

Updated: May 13, 2026

Transcriptomic Analysis of C. elegans RNA Sequencing Data Through the Tuxedo Suite on the Galaxy Project
10:19

Transcriptomic Analysis of C. elegans RNA Sequencing Data Through the Tuxedo Suite on the Galaxy Project

Published on: April 8, 2017

17.3K

Deciphering genomic codes using advanced natural language processing techniques: a scoping review.

Shuyan Cheng1, Yishu Wei1, Yiliang Zhou1

  • 1Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States.

Journal of the American Medical Informatics Association : JAMIA
|February 25, 2025
PubMed
Summary
This summary is machine-generated.

Natural language processing (NLP) and large language models (LLMs) are revolutionizing genomic data analysis. These advanced techniques improve the interpretation of genomic sequences and prediction of regulatory elements, paving the way for personalized medicine.

Keywords:
genomic sequencing datalarge language modelsnatural language processingregulatory annotations

More Related Videos

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

33.4K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.3K

Related Experiment Videos

Last Updated: May 13, 2026

Transcriptomic Analysis of C. elegans RNA Sequencing Data Through the Tuxedo Suite on the Galaxy Project
10:19

Transcriptomic Analysis of C. elegans RNA Sequencing Data Through the Tuxedo Suite on the Galaxy Project

Published on: April 8, 2017

17.3K
Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

33.4K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.3K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Human genomic sequencing generates vast, complex data challenging traditional analysis.
  • Natural Language Processing (NLP) offers novel methods for interpreting biological sequences.
  • Large Language Models (LLMs) and transformer architectures show promise in this domain.

Purpose of the Study:

  • To review the application of NLP, LLMs, and transformers in genomic data analysis.
  • To focus on tokenization, transformer models, and regulatory annotation prediction.
  • To assess data and model accessibility in recent genomic NLP literature.

Main Methods:

  • Scoping review conducted following PRISMA guidelines.
  • Searches across major scientific databases (PubMed, Medline, Scopus, Web of Science, Embase, ACM Digital Library).
  • Inclusion of studies on NLP methodologies for genomic sequencing data analysis, irrespective of publication date or type.

Main Results:

  • 26 studies (2021-April 2024) were selected.
  • Tokenization and transformer models significantly enhance genomic data processing and understanding.
  • Applications include predicting regulatory annotations like transcription-factor binding sites and chromatin accessibility.

Conclusions:

  • NLP and LLMs show significant potential for streamlining large-scale genomic data interpretation.
  • These technologies can advance personalized medicine through efficient genomic analysis.
  • Further research is needed to address limitations in model accessibility, interpretability, and transparency.