Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sampling Plans01:23

Sampling Plans

165
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
165
Sampling Theorem01:15

Sampling Theorem

285
In signal processing, the analysis of continuous-time signals, denoted as x(t), often involves sampling techniques to convert these signals into discrete-time signals. This process is essential for digital representation and manipulation. A critical component in sampling is the train of impulses, characterized by the sampling interval and the sampling frequency. The relationship between these parameters and the original signal's properties dictates the success of the sampling process.
285
Hückel's Rule Diagram of π MOs: Frost Circle01:08

Hückel's Rule Diagram of π MOs: Frost Circle

4.2K
The Frost circle or the inscribed polygon method is a graphical method for determining the relative energies of π molecular orbitals (MOs) for planar, fully conjugated, and monocyclic compounds. This method was first described by A. A. Frost and Boris Musulin in 1953.
A Frost circle is constructed by drawing a polygon whose number of edges is equal to the number of carbons of the given cyclic system, with one of the vertices pointing down. Then, a circle is drawn enclosing the polygon so...
4.2K
Cluster Sampling Method01:20

Cluster Sampling Method

11.6K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
11.6K
Vector Algebra: Graphical Method01:10

Vector Algebra: Graphical Method

11.8K
Vectors can be multiplied by scalars, added to other vectors, or subtracted from other vectors. The vector sum of two (or more) vectors is called the resultant vector or, for short, the resultant.
We use the laws of geometry to construct resultant vectors, followed by trigonometry to find vector magnitudes and directions. For a geometric construction of the sum of two vectors in a plane, we follow the parallelogram rule. Suppose two vectors are at arbitrary positions. Translate either one of...
11.8K
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

38
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
38

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Metappuccino: large language model-driven reconstruction of sequence read archive metadata for cancer research.

Bioinformatics (Oxford, England)·2026
Same author

Automated evaluation of multiple sequence alignment methods to handle third generation sequencing errors.

PeerJ·2026
Same author

K2R: Tinted de Bruijn graphs implementation for efficient read extraction from sequencing datasets.

Bioinformatics advances·2025
Same author

CREMSA: compressed indexing of (ultra) large multiple sequence alignments.

Bioinformatics (Oxford, England)·2025
Same author

OReO: optimizing read order for practical compression.

Bioinformatics advances·2025
Same author

A strong internal promoter drives massive expression of YEATS-domain devoid MLLT3 transcripts in HSC and most lethal AML.

Cancer communications (London, England)·2025
Same journal

Haplotype-aware long-read error correction.

Algorithms for molecular biology : AMB·2026
Same journal

Extension of partial atom-to-atom maps: uniqueness and algorithms.

Algorithms for molecular biology : AMB·2026
Same journal

Lossless pangenome indexing using tag arrays.

Algorithms for molecular biology : AMB·2026
Same journal

Dolphyin: a combinatorial algorithm for identifying 1-Dollo phylogenies in cancer.

Algorithms for molecular biology : AMB·2026
Same journal

Probing transcription factor subsets in gene regulatory networks.

Algorithms for molecular biology : AMB·2026
Same journal

Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features.

Algorithms for molecular biology : AMB·2026
See all related articles

Related Experiment Video

Updated: May 28, 2025

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking
05:58

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Published on: August 29, 2018

8.8K

Fractional hitting sets for efficient multiset sketching.

Timothé Rouzé1,2,3, Igor Martayan4, Camille Marchet4

  • 1G5 - SeqBio, Institut pasteur, Université Paris Cité, 75724, Paris, France. trouze@pasteur.fr.

Algorithms for Molecular Biology : AMB
|February 8, 2025
PubMed
Summary
This summary is machine-generated.

We developed supersampler, a novel tool for genomic data analysis. It creates smaller, efficient sketches using Fractional Hitting Sets, reducing space and memory usage compared to existing methods.

Keywords:
k-merContainmentJaccardMetagenomicsSketchingSubsampling

More Related Videos

Author Spotlight: Innovative Device Development for Advancing Dendroecology and Wood Anatomy Research
07:05

Author Spotlight: Innovative Device Development for Advancing Dendroecology and Wood Anatomy Research

Published on: September 27, 2024

2.5K
New Variations for Strategy Set-shifting in the Rat
09:45

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

8.1K

Related Experiment Videos

Last Updated: May 28, 2025

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking
05:58

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Published on: August 29, 2018

8.8K
Author Spotlight: Innovative Device Development for Advancing Dendroecology and Wood Anatomy Research
07:05

Author Spotlight: Innovative Device Development for Advancing Dendroecology and Wood Anatomy Research

Published on: September 27, 2024

2.5K
New Variations for Strategy Set-shifting in the Rat
09:45

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

8.1K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • The rapid growth of sequencing data demands efficient analysis methods.
  • Locality-sensitive hashing creates data sketches but struggles with divergent datasets.
  • Existing scalable methods like sourmash lack resource-efficient indexing.

Purpose of the Study:

  • To develop lighter genomic data sketches with comparable results.
  • To enhance the efficiency of genomic data processing and analysis.
  • To introduce a novel sketching scheme for large-scale genomic datasets.

Main Methods:

  • Introduced Fractional Hitting Sets to cover a fraction of k-mer space.
  • Encoded covered k-mers as super-k-mers for space-efficient representation.
  • Developed the supersampler tool to implement this novel sketching scheme.

Main Results:

  • supersampler achieves comparable results to sourmash.
  • supersampler uses an order of magnitude less space and memory.
  • supersampler operates several times faster than sourmash.

Conclusions:

  • Fractional Hitting Sets provide a feasible and efficient sketching scheme.
  • supersampler offers a resource-efficient solution for genomic data analysis.
  • The approach addresses challenges posed by the expanding genomic data landscape.