Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Accuracy, limits, and approximation01:28

Accuracy, limits, and approximation

418
Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...
418
Estimation of the Physical Quantities01:05

Estimation of the Physical Quantities

4.0K
On many occasions, physicists, other scientists, and engineers need to make estimates of a particular quantity. These are sometimes referred to as guesstimates, order-of-magnitude approximations, back-of-the-envelope calculations, or Fermi calculations. The physicist Enrico Fermi was famous for his ability to estimate various kinds of data with surprising precision. Estimating does not mean guessing a number or a formula at random. Instead, estimation means using prior experience and sound...
4.0K
Fineness Modulus01:19

Fineness Modulus

221
The fineness modulus (FM) of aggregate is a numerical index that measures the coarseness or fineness of the particles. It is calculated by adding the cumulative percentages of aggregate retained on each of a specified series of sieves and dividing the sum by 100.
Consider performing sieve analysis on sand through a set of ASTM sieves. The weight of aggregate retained in each sieve and pan placed at the bottom is recorded, as given in Column B of Table 1.
To determine the fineness modulus of...
221
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

3.1K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
3.1K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.6K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.6K
Identifying Statistically Significant Differences: The F-Test01:14

Identifying Statistically Significant Differences: The F-Test

1.5K
The F-test is used to compare two sample variances to each other or compare the sample variance to the population variance. It is used to decide whether an indeterminate error can explain the difference in their values. The underlying assumptions that allow the use of the F-test include the data set or sets are normally distributed, and the data sets are independent of each other. The test statistic F is calculated by dividing one variance by another. In other words, the square of one standard...
1.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Estimation of substitution and indel rates via <i>k</i>-mer statistics.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2026
Same author

MaxGeomHash: An Algorithm for Variable-Size Random Sampling of Distinct Elements.

bioRxiv : the preprint server for biology·2025
Same author

Leveraging FracMinHash Containment for Genomic <math><msub><mrow><mi>d</mi></mrow> <mrow><mi>N</mi></mrow></msub> <mo>/</mo> <msub><mrow><mi>d</mi></mrow> <mrow><mi>S</mi></mrow></msub></math>.

bioRxiv : the preprint server for biology·2025
Same author

Announcing the Biomedical Data Translator: Initial Public Release.

Clinical and translational science·2025
Same author

Estimation of substitution and indel rates via <i>k</i>-mer statistics.

bioRxiv : the preprint server for biology·2025
Same author

CAMI Benchmarking Portal: online evaluation and ranking of metagenomic software.

Nucleic acids research·2025
Same journal

Haplotype-aware long-read error correction.

Algorithms for molecular biology : AMB·2026
Same journal

Extension of partial atom-to-atom maps: uniqueness and algorithms.

Algorithms for molecular biology : AMB·2026
Same journal

Lossless pangenome indexing using tag arrays.

Algorithms for molecular biology : AMB·2026
Same journal

Dolphyin: a combinatorial algorithm for identifying 1-Dollo phylogenies in cancer.

Algorithms for molecular biology : AMB·2026
Same journal

Probing transcription factor subsets in gene regulatory networks.

Algorithms for molecular biology : AMB·2026
Same journal

Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features.

Algorithms for molecular biology : AMB·2026
See all related articles

Related Experiment Video

Updated: May 17, 2025

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

42.6K

Estimating similarity and distance using FracMinHash.

Mahmudur Rahman Hera1, David Koslicki2,3,4

  • 1School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, USA. mbr5797@psu.edu.

Algorithms for Molecular Biology : AMB
|May 15, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a theoretical framework for FracMinHash sketches to estimate various similarity metrics in genomic data. A new tool, frac-kmc, provides fast, parallelized sketch generation for accurate similarity analysis.

Keywords:
FracMinHashHashingMin-HashSimilaritySketchingTheoryk-mer

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.4K
Quantifying Intermembrane Distances with Serial Image Dilations
07:45

Quantifying Intermembrane Distances with Serial Image Dilations

Published on: September 28, 2018

6.3K

Related Experiment Videos

Last Updated: May 17, 2025

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

42.6K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.4K
Quantifying Intermembrane Distances with Serial Image Dilations
07:45

Quantifying Intermembrane Distances with Serial Image Dilations

Published on: September 28, 2018

6.3K

Area of Science:

  • Computational Biology
  • Bioinformatics
  • Genomics

Background:

  • Genomic and metagenomic data analysis requires scalable computational models.
  • Sketching techniques, particularly FracMinHash, are valuable for large-scale biological data analysis.
  • While FracMinHash is established for Jaccard and containment indices, theoretical gaps exist for other metrics.

Purpose of the Study:

  • To develop a theoretical framework for estimating similarity/distance metrics using FracMinHash sketches.
  • To establish conditions for sound estimation and recommend parameters for accuracy.
  • To introduce a novel, efficient FracMinHash sketch generator.

Main Methods:

  • Developed a theoretical framework for FracMinHash-based metric estimation.
  • Identified conditions and scale factors for accurate estimation.
  • Implemented frac-kmc, a parallel FracMinHash sketch generation tool.

Main Results:

  • Validated theoretical findings with experimental evidence.
  • frac-kmc demonstrated to be the fastest FracMinHash sketch generator.
  • Achieved accurate and precise cosine similarity estimation on real genomic data using frac-kmc.

Conclusions:

  • The theoretical framework enables sound estimation of various metrics from FracMinHash sketches.
  • frac-kmc offers a significant speedup and parallelization for sketch generation.
  • This work enhances the utility of FracMinHash for large-scale genomic data analysis.