Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Accuracy, limits, and approximation

Accuracy, limits, and approximation

Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...

Estimation of the Physical Quantities

Estimation of the Physical Quantities

On many occasions, physicists, other scientists, and engineers need to make estimates of a particular quantity. These are sometimes referred to as guesstimates, order-of-magnitude approximations, back-of-the-envelope calculations, or Fermi calculations. The physicist Enrico Fermi was famous for his ability to estimate various kinds of data with surprising precision. Estimating does not mean guessing a number or a formula at random. Instead, estimation means using prior experience and sound...

Fineness Modulus

Fineness Modulus

The fineness modulus (FM) of aggregate is a numerical index that measures the coarseness or fineness of the particles. It is calculated by adding the cumulative percentages of aggregate retained on each of a specified series of sieves and dividing the sum by 100.
Consider performing sieve analysis on sand through a set of ASTM sieves. The weight of aggregate retained in each sieve and pan placed at the bottom is recorded, as given in Column B of Table 1.
To determine the fineness modulus of...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Identifying Statistically Significant Differences: The F-Test

Identifying Statistically Significant Differences: The F-Test

The F-test is used to compare two sample variances to each other or compare the sample variance to the population variance. It is used to decide whether an indeterminate error can explain the difference in their values. The underlying assumptions that allow the use of the F-test include the data set or sets are normally distributed, and the data sets are independent of each other. The test statistic F is calculated by dividing one variance by another. In other words, the square of one standard...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Estimation of substitution and indel rates via <i>k</i>-mer statistics.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2026

Same author

MaxGeomHash: An Algorithm for Variable-Size Random Sampling of Distinct Elements.

bioRxiv : the preprint server for biology·2025

Same author

Leveraging FracMinHash Containment for Genomic <math><msub><mrow><mi>d</mi></mrow> <mrow><mi>N</mi></mrow></msub> <mo>/</mo> <msub><mrow><mi>d</mi></mrow> <mrow><mi>S</mi></mrow></msub></math>.

bioRxiv : the preprint server for biology·2025

Same author

Announcing the Biomedical Data Translator: Initial Public Release.

Clinical and translational science·2025

Same author

Estimation of substitution and indel rates via <i>k</i>-mer statistics.

bioRxiv : the preprint server for biology·2025

Same author

CAMI Benchmarking Portal: online evaluation and ranking of metagenomic software.

Nucleic acids research·2025

Same journal

Haplotype-aware long-read error correction.

Algorithms for molecular biology : AMB·2026

Same journal

Extension of partial atom-to-atom maps: uniqueness and algorithms.

Algorithms for molecular biology : AMB·2026

Same journal

Lossless pangenome indexing using tag arrays.

Algorithms for molecular biology : AMB·2026

Same journal

Dolphyin: a combinatorial algorithm for identifying 1-Dollo phylogenies in cancer.

Algorithms for molecular biology : AMB·2026

Same journal

Probing transcription factor subsets in gene regulatory networks.

Algorithms for molecular biology : AMB·2026

Same journal

Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features.

Algorithms for molecular biology : AMB·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 17, 2025

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Estimating similarity and distance using FracMinHash.

Mahmudur Rahman Hera¹, David Koslicki^2,3,4

¹School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, USA. mbr5797@psu.edu.

Algorithms for Molecular Biology : AMB

|May 15, 2025

Summary

This summary is machine-generated.

This study introduces a theoretical framework for FracMinHash sketches to estimate various similarity metrics in genomic data. A new tool, frac-kmc, provides fast, parallelized sketch generation for accurate similarity analysis.

Keywords:

FracMinHash Hashing Min-Hash Similarity Sketching Theory k-mer

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Quantifying Intermembrane Distances with Serial Image Dilations

Quantifying Intermembrane Distances with Serial Image Dilations

Published on: September 28, 2018

Related Experiment Videos

Last Updated: May 17, 2025

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Quantifying Intermembrane Distances with Serial Image Dilations

Quantifying Intermembrane Distances with Serial Image Dilations

Published on: September 28, 2018

Area of Science:

Computational Biology
Bioinformatics
Genomics

Background:

Genomic and metagenomic data analysis requires scalable computational models.
Sketching techniques, particularly FracMinHash, are valuable for large-scale biological data analysis.
While FracMinHash is established for Jaccard and containment indices, theoretical gaps exist for other metrics.

Purpose of the Study:

To develop a theoretical framework for estimating similarity/distance metrics using FracMinHash sketches.
To establish conditions for sound estimation and recommend parameters for accuracy.
To introduce a novel, efficient FracMinHash sketch generator.

Main Methods:

Developed a theoretical framework for FracMinHash-based metric estimation.
Identified conditions and scale factors for accurate estimation.
Implemented frac-kmc, a parallel FracMinHash sketch generation tool.

Main Results:

Validated theoretical findings with experimental evidence.
frac-kmc demonstrated to be the fastest FracMinHash sketch generator.
Achieved accurate and precise cosine similarity estimation on real genomic data using frac-kmc.

Conclusions:

The theoretical framework enables sound estimation of various metrics from FracMinHash sketches.
frac-kmc offers a significant speedup and parallelization for sketch generation.
This work enhances the utility of FracMinHash for large-scale genomic data analysis.