Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

DNA Base Pairing02:27

DNA Base Pairing

Erwin Chargaff’s rules on DNA equivalence paved the way for the discovery of base pairing in DNA. Chargaff’s rules state that in a double-stranded DNA molecule,
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
Comparing Copy Number Variations and SNPs02:26

Comparing Copy Number Variations and SNPs

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...
Sanger Sequencing01:57

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
Next-generation Sequencing03:00

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Nevermore: Target-Conditioned Protein-Ligand Representation Learning for Multi-Objective Lead Optimization with Database-Grounded Retrieval.

Biology·2026
Same author

Normalized compression distance for DNA classification.

PeerJ·2026
Same author

A 5'-UTR cis-acting RNA element targeted by RNase III is essential for DNA simple sequence repeat-dependent phase variation in Haemophilus influenzae.

Nucleic acids research·2025
Same author

Correction: Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.

Scientific data·2025
Same author

Epigenetic control of the ferric uptake regulator (Fur) and fumarate nitrate reductase (FNR) master regulatory proteins contributes to <i>Haemophilus influenzae</i> survival during lung infection.

mBio·2025
Same author

Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.

Scientific data·2025
Same journal

Lipid droplets as stress-buffering organelles in cancer cell homeostasis.

Communications biology·2026
Same journal

A distinct domain organization of cystathionine β-synthase underlies cysteine and H<sub>2</sub>S biosynthesis in Pseudomonas aeruginosa and Klebsiella pneumoniae.

Communications biology·2026
Same journal

Highly consistent anatomical asymmetry in a small primate brain: left is always larger in the marmoset monkey.

Communications biology·2026
Same journal

The morphogenetic activity of dAnk genes in the diatom Thalassiosira pseudonana is sensitive to Si availability.

Communications biology·2026
Same journal

Mechanical regulation of adipogenic reprogramming suppresses ovarian cancer progression.

Communications biology·2026
Same journal

Depth-resolved transcriptional activity of antibiotic resistance genes in deep permafrost of the Qinghai-Tibet Plateau.

Communications biology·2026
See all related articles

Related Experiment Video

Updated: Jun 18, 2026

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization.

Mohammadsaleh Refahi1, Bahrad A Sokhansanj1, Joshua C Mell2

  • 1Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA.

Communications Biology
|March 29, 2025
PubMed
Summary
This summary is machine-generated.

Scorpio, a new framework for DNA sequence analysis, uses contrastive learning to improve predictions. It excels at classifying genomic data and identifying genes, even for novel sequences.

More Related Videos

Rare Event Detection Using Error-corrected DNA and RNA Sequencing
10:36

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

A Nonsequencing Approach for the Rapid Detection of RNA Editing
08:50

A Nonsequencing Approach for the Rapid Detection of RNA Editing

Published on: April 21, 2022

Related Experiment Videos

Last Updated: Jun 18, 2026

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Rare Event Detection Using Error-corrected DNA and RNA Sequencing
10:36

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

A Nonsequencing Approach for the Rapid Detection of RNA Editing
08:50

A Nonsequencing Approach for the Rapid Detection of RNA Editing

Published on: April 21, 2022

Area of Science:

  • Genomics
  • Bioinformatics
  • Machine Learning

Background:

  • Genomic and metagenomic sequence analysis is complex due to sequence divergence, variable codon usage, and unclear selective constraints.
  • Existing methods often struggle with generalization to novel DNA sequences and taxa.

Purpose of the Study:

  • To introduce Scorpio (Sequence Contrastive Optimization for Representation and Predictive Inference on DNA), a novel framework for nucleotide sequence analysis.
  • To enhance representation learning for DNA sequences using contrastive learning.

Main Methods:

  • Scorpio leverages pre-trained genomic language models and k-mer frequency embeddings.
  • The framework employs contrastive learning to improve sequence embeddings.
  • Tested on diverse datasets with varying DNA sequence lengths.

Main Results:

  • Scorpio demonstrates competitive performance in taxonomic and gene classification.
  • Achieved high accuracy in antimicrobial resistance (AMR) gene identification and promoter detection.
  • Showcased strong generalization capabilities to novel DNA sequences and taxa, outperforming alignment-based methods.

Conclusions:

  • Scorpio provides a versatile and robust framework for diverse genomic applications.
  • The model's ability to generalize addresses a key limitation in current sequence analysis.
  • Analysis reveals underlying biological information in Scorpio's representations, including correlations with gene expression and taxonomy.