Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

DNA Microarrays02:34

DNA Microarrays

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
RNA-seq03:21

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while microarray-based...
Conserved Binding Sites01:49

Conserved Binding Sites

Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally analyses the...
Nucleic Acid Structure01:25

Nucleic Acid Structure

The pentose sugar in DNA is deoxyribose, while in RNA the pentose sugar is ribose. The difference between the sugars is the presence of the hydroxyl group on the ribose's second carbon and a hydrogen on the deoxyribose's second carbon. The phosphate residue attaches to the hydroxyl group of the 5′ carbon of one sugar and the hydroxyl group of the 3′ carbon of the sugar of the next nucleotide, which forms  a 5′ to 3′ phosphodiester linkage.
DNA Structure
DNA has a double-helix structure. The...
Base-pairing and DNA Repair02:27

Base-pairing and DNA Repair

Erwin Chargaff’s rules on DNA equivalence paved the way for the discovery of base pairing in DNA. Chargaff’s rules state that in a double-stranded DNA molecule,
DNA Base Pairing02:27

DNA Base Pairing

Erwin Chargaff’s rules on DNA equivalence paved the way for the discovery of base pairing in DNA. Chargaff’s rules state that in a double-stranded DNA molecule,

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Systemic Mastocytosis Presenting as Non-cirrhotic Portal Hypertension With Upper Gastrointestinal Bleeding in a Young Female Patient: A Case Report.

Cureus·2026
Same author

Pullulan-based biomaterials: Bridging structural chemistry and modification strategies with clinical translation.

Carbohydrate polymers·2026
Same author

Recent Applications of Tissue-Engineered 3D Scaffolds in Oncology: Present and Future Perspectives.

Protein and peptide letters·2026
Same author

Multifunctional electrospun nanofibers for cancer therapy and regenerative reconstruction: current progress and future prospects.

International journal of pharmaceutics·2026
Same author

Identifying Robust Subclonal Structures through Tumor Progression Tree Alignment.

bioRxiv : the preprint server for biology·2026
Same author

Multidrug-Resistant Serratia fonticola Causing Diabetic Foot Infection: A Rare Case Highlighting Emerging Antimicrobial Resistance.

Cureus·2026
Same journal

Genetic Impacts on Variability of Body Fat Distribution Uncover Gene-Environment and Gene-Gene Interactions.

bioRxiv : the preprint server for biology·2026
Same journal

16S ribosomal RNA modification drives transcript-specific translation efficiency.

bioRxiv : the preprint server for biology·2026
Same journal

FlcE latches onto the FliL-stator complex to turbocharge flagellar motility in <i>Borrelia burgdorferi</i>.

bioRxiv : the preprint server for biology·2026
Same journal

Synaptic pruning, myelination and the emergence of psychiatric disorders in late adolescence.

bioRxiv : the preprint server for biology·2026
Same journal

Structural and functional insights into the Rcs phosphorelay.

bioRxiv : the preprint server for biology·2026
Same journal

The structural basis of RanGAP1 regulation and catalysis in nuclear transport.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: May 26, 2026

RNA Next-Generation Sequencing and a Bioinformatics Pipeline to Identify Expressed LINE-1s at the Locus-Specific Level
11:04

RNA Next-Generation Sequencing and a Bioinformatics Pipeline to Identify Expressed LINE-1s at the Locus-Specific Level

Published on: May 19, 2019

LOCALE: Local-Alignment Embeddings for Noise-Robust DNA Search at SRA Scale.

Ryan P Synk1, Prashant Pandey2, S Cenk Sahinalp3

  • 1University of Maryland, College Park.

Biorxiv : the Preprint Server for Biology
|May 25, 2026
PubMed
Summary
This summary is machine-generated.

We developed LOCALE, a new method for searching large sequencing datasets. LOCALE uses vector embeddings to accurately find related DNA sequences, even with errors or mutations.

More Related Videos

RNA-Associated Chromatin DNA-DNA Interaction Method
11:01

RNA-Associated Chromatin DNA-DNA Interaction Method

Published on: April 30, 2026

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites
09:31

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Published on: March 22, 2016

Related Experiment Videos

Last Updated: May 26, 2026

RNA Next-Generation Sequencing and a Bioinformatics Pipeline to Identify Expressed LINE-1s at the Locus-Specific Level
11:04

RNA Next-Generation Sequencing and a Bioinformatics Pipeline to Identify Expressed LINE-1s at the Locus-Specific Level

Published on: May 19, 2019

RNA-Associated Chromatin DNA-DNA Interaction Method
11:01

RNA-Associated Chromatin DNA-DNA Interaction Method

Published on: April 30, 2026

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites
09:31

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Published on: March 22, 2016

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Searching large-scale raw sequencing data repositories like the NIH Sequence Read Archive (SRA) is crucial for biological discovery.
  • Current search methods struggle with scalability and are sensitive to sequencing errors and biological variations due to reliance on exact k-mer matching.

Purpose of the Study:

  • To develop a scalable and robust sequence search method for petabase-scale repositories.
  • To improve the accuracy of sequence retrieval in the presence of sequencing errors and biological divergence.

Main Methods:

  • Recasting sequence search as a dense retrieval problem using vector embeddings.
  • Training a DNABERT-2 encoder with an InfoNCE objective on biologically informed data augmentations (corrupted sequence crops).
  • Evaluating the LOCALE method on SRA benchmarks with varying dataset sizes and mutation rates.

Main Results:

  • LOCALE achieved 62.4% average Recall@Rq at a 10% mutation rate on a 50-accession SRA benchmark, outperforming baselines in noisy-query settings.
  • On a larger 500-accession, 15-Gbp benchmark, LOCALE achieved an AUPRC of 0.508 at 10% mutation, significantly higher than MetaGraph's 0.129.
  • The method demonstrates effective retrieval by ranking locally aligned sequences higher than unaligned ones.

Conclusions:

  • LOCALE offers a scalable and accurate solution for searching large sequencing data repositories.
  • The dense retrieval approach effectively handles sequencing errors and biological divergence, outperforming traditional methods.
  • This method has the potential to transform biological discovery by enabling efficient exploration of vast genomic datasets.