Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Hopper: a mathematically optimal algorithm for sketching biological data.

Benjamin DeMeo1,2, Bonnie Berger2,3

  • 1Department of Bioinformatics, Harvard University, Cambridge, MA 02138, USA.

Bioinformatics (Oxford, England)
|July 14, 2020
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Promera: a unified model for biomolecular structure prediction, filtering, and design.

bioRxiv : the preprint server for biology·2026
Same author

Thousandfold Expansion Microscopy.

bioRxiv : the preprint server for biology·2026
Same author

SwitchCraft: A Programmatic Framework for Designing State-Switching Proteins.

ArXiv·2026
Same author

Multi-resolution modeling of a discrete stochastic process identifies causes of cancer.

... International Conference on Learning Representations·2026
Same author

AI-based methods for simulating, sampling, and predicting protein ensembles.

Current opinion in structural biology·2026
Same author

Constrained Diffusion as a Paradigm for Evolution.

bioRxiv : the preprint server for biology·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Hopper is a new toolkit that speeds up single-cell RNA sequencing analysis and identifies rare cell populations. It uses intelligent subsampling to represent large datasets efficiently, making advanced analysis accessible on a laptop.

Area of Science:

  • Genomics
  • Computational Biology
  • Bioinformatics

Background:

  • Single-cell RNA sequencing (scRNA-seq) generates massive datasets, posing significant computational and analytical challenges.
  • Current analysis methods struggle with large-scale data, requiring extensive resources and often missing rare cell populations.
  • Existing methods can be biased towards common cell types, overlooking biologically significant small cell groups.

Purpose of the Study:

  • To introduce Hopper, a novel toolkit designed to accelerate scRNA-seq data analysis.
  • To enhance the detection of transcriptional diversity and rare cell populations within large scRNA-seq datasets.
  • To enable efficient and targeted multi-resolution analyses through intelligent subsampling.

Main Methods:

  • Hopper employs intelligent subsampling, or sketching, to create representative downsampled datasets.

Related Experiment Videos

  • It approximates the Hausdorff distance to ensure comprehensive representation of the full dataset in the sketch.
  • Hopper iteratively adds points and allows targeted sampling, with Treehopper further accelerating analysis via spatial partitioning.
  • Main Results:

    • Hopper successfully identified a cluster of 64 macrophages (0.004% of data) from a 5000-cell sketch of 1.3 million mouse brain cells.
    • The toolkit revealed other small, biologically relevant immune cell populations missed by analyzing the full dataset.
    • Hopper demonstrated even representation of cell types in small sketches, outperforming prior methods on a 2 million cell dataset.

    Conclusions:

    • Hopper and Treehopper significantly condense transcriptional information, making large-scale scRNA-seq analysis feasible for individual users on standard hardware.
    • These tools democratize access to high-performance computational biology, enabling detailed study of transcriptional diversity.
    • The methods facilitate the discovery of rare cell types and biological insights previously inaccessible due to data scale.