Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Probability Histograms01:17

Probability Histograms

12.3K
A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.
12.3K
Probability Distributions01:32

Probability Distributions

9.0K
 The probability of a random variable x  is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...
9.0K
Poisson Probability Distribution01:09

Poisson Probability Distribution

9.5K
A Poisson probability distribution is a discrete probability distribution. It gives the probability of a number of events occurring in a fixed interval of time or space if these events happen at a known average rate and independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on average, there are five words spelled incorrectly in 100 pages. The interval is 100 pages.
The...
9.5K
Law of Independent Assortment02:03

Law of Independent Assortment

57.5K
While Mendel’s Law of Segregation states that the two alleles for one gene are separated into different gametes, a different question of how different genes are inherited remains. For example, is the gene for tall plants inherited with the gene for green peas? Mendel asked this question by experimenting with a dihybrid cross; a cross in which both parents are homozygous for two distinct traits resulting in an F1 generation that are heterozygous for both traits.
57.5K
Probability Laws01:49

Probability Laws

42.1K
Overview
42.1K
Cluster Sampling Method01:20

Cluster Sampling Method

13.1K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
13.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Kaminari: a frugal colored index for approximate <i>k</i>-mer queries.

Bioinformatics advances·2026
Same author

Kaminari: a resource-frugal index for approximate colored <i>k</i>-mer queries.

bioRxiv : the preprint server for biology·2025
Same author

Efficient and robust search of microbial genomes via phylogenetic compression.

Nature methods·2025
Same author

Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression.

bioRxiv : the preprint server for biology·2023
Same author

Space-efficient representation of genomic k-mer count tables.

Algorithms for molecular biology : AMB·2022
Same author

Fast and compact matching statistics analytics.

Bioinformatics (Oxford, England)·2022
Same journal

GMSA: A Graph Matching and Point Cloud Registration-Based Method for Spatial Transcriptomics Data Alignment.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Investigations on Multiple Protein Scaffold Filling.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Cell Type Prediction for Single-Cell RNA Sequencing Utilizing Unsupervised Domain Adaptation and Semi-Supervised Learning.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

PPIGAN: Prediction of Protein-Protein Interactions Using Generative Adversarial Networks.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Deep Structure-Enhanced Cell Clustering Model for Single-Cell RNA Sequencing Data.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Asymmetric Drug-Drug Interaction Prediction Based on Generative Adversarial Networks and Knowledge Graph.

Journal of computational biology : a journal of computational molecular cell biology·2026
See all related articles

Related Experiment Video

Updated: Oct 6, 2025

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.4K

Set-Min Sketch: A Probabilistic Map for Power-Law Distributions with Application to k-Mer Annotation.

Yoshihiro Shibuya1, Djamal Belazzougui2, Gregory Kucherov3,4

  • 1LIGM, Modèles et Algorithmes Group, Université Gustave Eiffel, Marne-la-Vallée, France.

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
|January 20, 2022
PubMed
Summary
This summary is machine-generated.

Set-Min sketch offers a memory-efficient way to store k-mer counts without explicitly listing k-mers. This bioinformatics tool provides high accuracy with minimal memory overhead, outperforming existing methods.

Keywords:
k-mer countingk-mer spectrummax-min sketchpower-law distributionset-min sketchsketching

More Related Videos

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens
09:14

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Published on: June 28, 2018

7.3K
Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.4K

Related Experiment Videos

Last Updated: Oct 6, 2025

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.4K
Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens
09:14

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Published on: June 28, 2018

7.3K
Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.4K

Area of Science:

  • Bioinformatics
  • Data Structures
  • Computational Biology

Background:

  • K-mer counts are crucial features in bioinformatics pipelines.
  • Current methods prioritize time or memory, resulting in large k-mer count tables.
  • Storing explicit k-mers is unnecessary when the set is known, enabling focus on counters.

Purpose of the Study:

  • Introduce Set-Min sketch, a novel technique for representing associative maps.
  • Apply Set-Min sketch to the problem of representing k-mer count tables.
  • Compare Set-Min sketch's accuracy and memory efficiency against Count-Min and Max-Min sketches.

Main Methods:

  • Developed Set-Min sketch, inspired by Count-Min sketch.
  • Defined Max-Min sketch as an improved variant of Count-Min for static datasets.
  • Evaluated Set-Min sketch's performance on k-mer count tables, particularly for genomic datasets.

Main Results:

  • Set-Min sketch demonstrates provably higher accuracy than Count-Min and Max-Min sketches.
  • The technique achieves a very low error rate (probability and size) with only a moderate memory increase.
  • Set-Min sketches require up to an order of magnitude less space than Minimal Perfect Hash Function (MPHF)-based solutions for large k and assembled genomes.

Conclusions:

  • Set-Min sketch is a highly accurate and memory-efficient method for representing k-mer count tables.
  • Its space efficiency is particularly advantageous for large genomic datasets due to the power-law distribution of k-mer counts.
  • Set-Min sketch offers a superior alternative to existing methods like MPHFs and Count-Min sketches for specific bioinformatics applications.