Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Modified Boxplots

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

Hybridization of Atomic Orbitals I

Hybridization of Atomic Orbitals I

The mathematical expression known as the wave function, ψ, contains information about each orbital and the wavelike properties of electrons in an isolated atom. When atoms are bound together in a molecule, the wave functions combine to produce new mathematical descriptions that have different shapes. This process of combining the wave functions for atomic orbitals is called hybridization and is mathematically accomplished by the linear combination of atomic orbitals. The new orbitals that...

¹H NMR of Labile Protons: Temporal Resolution

¹H NMR of Labile Protons: Temporal Resolution

Protons bonded to heteroatoms such as nitrogen and oxygen exhibit a range of chemical shift values. This is due to the varying degree of hydrogen bonding between the proton and the heteroatom in other molecules. The extent of hydrogen bonding affects the electron density around the proton, thereby giving different chemical shift values for the protons in the proton NMR spectrum.
The –OH proton in alcohols typically appears in the range of δ 2 to 5 ppm but can vary depending on the specific...

¹H NMR: Complex Splitting

¹H NMR: Complex Splitting

A proton M that is coupled to a proton X results in doublet signals for M. However, NMR-active nuclei can be simultaneously coupled to more than one nonequivalent nucleus. When M is coupled to a second proton A, such as in styrene oxide, each peak in the doublet is split into another doublet.
Splitting diagrams or splitting tree diagrams are routinely used to depict such complex couplings. While drawing splitting diagrams, the splitting with the larger coupling constant is usually applied...

Sign Test for Matched Pairs

Sign Test for Matched Pairs

The sign test for matched pairs offers a robust method for comparing two paired samples, often for the effects of an intervention in one of them. This method is very useful in situations where the underlying distribution of the data is unknown. The test compares two related samples—often pre- and post-treatment measurements on the same subjects—to determine if there are significant differences in their median values.
To conduct the sign test, we first calculate the differences in...

Interpreting ¹H NMR Signal Splitting: The (n + 1) Rule

Interpreting ¹H NMR Signal Splitting: The (n + 1) Rule

In the AX proton spin system, proton A can sense the two spin states of a coupled proton X, resulting in a doublet NMR signal with two peaks of equal (1:1) intensity. When proton A is coupled to two equivalent protons (AX2 spin system), the spin states of each X can be aligned with or against the external field, creating three possible scenarios. This results in a 1:2:1 triplet signal, where the central peak corresponds to the chemical shift of A and is twice as large or intense as the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

KuPID: Kmer-based Upstream Preprocessing of Long Reads for Isoform Discovery.

bioRxiv : the preprint server for biology·2026

Same author

Understanding data differences across the ENACT federated research network.

JAMIA open·2026

Same author

RAmpSim: A Thermodynamic Simulator for Hybridization Capture in Metagenomic Sequencing.

bioRxiv : the preprint server for biology·2025

Same author

Long-read reconstruction of many diverse haplotypes with devider.

Genome research·2025

Same author

Bridging the Gap: The State of Global Transplant Research Collaboration.

Transplantation direct·2025

Same author

uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering.

Data in brief·2025

Same journal

STORM: Exploiting Spatiotemporal Continuity for Trajectory Similarity Learning in Road Networks.

IEEE transactions on knowledge and data engineering·2026

Same journal

Hierarchical Active Learning with Label Proportions on Data Regions.

IEEE transactions on knowledge and data engineering·2025

Same journal

Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis.

IEEE transactions on knowledge and data engineering·2025

Same journal

Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity.

IEEE transactions on knowledge and data engineering·2025

Same journal

A Neural Database for Answering Aggregate Queries on Incomplete Relational Data.

IEEE transactions on knowledge and data engineering·2024

Same journal

Weakly Supervised Concept Map Generation through Task-Guided Graph Translation.

IEEE transactions on knowledge and data engineering·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 4, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

HyperMinHash: MinHash in LogLog space.

Yun William Yu¹, Griffin M Weber²

¹Department of Mathematics, University of Toronto, Toronto, Ontario, Canada M5S 2E4.

IEEE Transactions on Knowledge and Data Engineering

|January 30, 2024

Summary

This summary is machine-generated.

We introduce HyperMinHash, a compressed MinHash sketch for efficient Jaccard index estimation. This method offers improved cardinality estimation in sub-logarithmic space, outperforming traditional MinHash.

Keywords:

compression hyperloglog min-wise hashing sketching streaming

More Related Videos

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers

In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers

Published on: July 28, 2018

Related Experiment Videos

Last Updated: Jul 4, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers

In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers

Published on: July 28, 2018

Area of Science:

Computer Science
Data Structures
Algorithms

Background:

MinHash is a widely used algorithm for Jaccard index estimation.
Existing compressed MinHash variants often sacrifice essential features like streaming updates or unions.
Sub-logarithmic space algorithms face challenges in maintaining accuracy and functionality.

Purpose of the Study:

To develop a novel lossy compression technique for MinHash sketches.
To introduce HyperMinHash, a compressed sketch compatible with HyperLogLog.
To retain MinHash's core functionalities while reducing memory footprint.

Main Methods:

Lossy compression of MinHash using floating-point notation.
Building the HyperMinHash sketch on a HyperLogLog scaffold.
Analyzing the space-time complexity and accuracy trade-offs.

Main Results:

HyperMinHash achieves additive approximation error on Jaccard index with space.
Estimates Jaccard indices of 0.01 for cardinalities with ~10% relative error using 2MiB.
Outperforms MinHash in estimating Jaccard indices for larger cardinalities within the same memory constraints.

Conclusions:

HyperMinHash provides a space-efficient alternative to MinHash for Jaccard index estimation.
It preserves essential features like streaming updates and unions, crucial for big data applications.
Enables accurate Jaccard index estimation for larger datasets with limited memory resources.