Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Modified Boxplots00:57

Modified Boxplots

9.7K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
9.7K
Hybridization of Atomic Orbitals I03:24

Hybridization of Atomic Orbitals I

47.1K
The mathematical expression known as the wave function, ψ, contains information about each orbital and the wavelike properties of electrons in an isolated atom. When atoms are bound together in a molecule, the wave functions combine to produce new mathematical descriptions that have different shapes. This process of combining the wave functions for atomic orbitals is called hybridization and is mathematically accomplished by the linear combination of atomic orbitals. The new orbitals that...
47.1K
¹H NMR of Labile Protons: Temporal Resolution01:10

¹H NMR of Labile Protons: Temporal Resolution

1.1K
Protons bonded to heteroatoms such as nitrogen and oxygen exhibit a range of chemical shift values. This is due to the varying degree of hydrogen bonding between the proton and the heteroatom in other molecules. The extent of hydrogen bonding affects the electron density around the proton, thereby giving different chemical shift values for the protons in the proton NMR spectrum.
The –OH proton in alcohols typically appears in the range of δ 2 to 5 ppm but can vary depending on the specific...
1.1K
¹H NMR: Complex Splitting01:13

¹H NMR: Complex Splitting

1.3K
A proton M that is coupled to a proton X results in doublet signals for M. However, NMR-active nuclei can be simultaneously coupled to more than one nonequivalent nucleus. When M is coupled to a second proton A, such as in styrene oxide, each peak in the doublet is split into another doublet.
Splitting diagrams or splitting tree diagrams are routinely used to depict such complex couplings. While drawing splitting diagrams, the splitting with the larger coupling constant is usually applied...
1.3K
Sign Test for Matched Pairs01:17

Sign Test for Matched Pairs

131
The sign test for matched pairs offers a robust method for comparing two paired samples, often for the effects of an intervention in one of them. This method is very useful in situations where the underlying distribution of the data is unknown. The test compares two related samples—often pre- and post-treatment measurements on the same subjects—to determine if there are significant differences in their median values.
To conduct the sign test, we first calculate the differences in...
131
Interpreting ¹H NMR Signal Splitting: The (n + 1) Rule01:10

Interpreting ¹H NMR Signal Splitting: The (n + 1) Rule

1.3K
In the AX proton spin system, proton A can sense the two spin states of a coupled proton X, resulting in a doublet NMR signal with two peaks of equal (1:1) intensity. When proton A is coupled to two equivalent protons (AX2 spin system), the spin states of each X can be aligned with or against the external field, creating three possible scenarios. This results in a 1:2:1  triplet signal, where the central peak corresponds to the chemical shift of A and is twice as large or intense as the...
1.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

KuPID: Kmer-based Upstream Preprocessing of Long Reads for Isoform Discovery.

bioRxiv : the preprint server for biology·2026
Same author

Understanding data differences across the ENACT federated research network.

JAMIA open·2026
Same author

RAmpSim: A Thermodynamic Simulator for Hybridization Capture in Metagenomic Sequencing.

bioRxiv : the preprint server for biology·2025
Same author

Long-read reconstruction of many diverse haplotypes with devider.

Genome research·2025
Same author

Bridging the Gap: The State of Global Transplant Research Collaboration.

Transplantation direct·2025
Same author

uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering.

Data in brief·2025
Same journal

STORM: Exploiting Spatiotemporal Continuity for Trajectory Similarity Learning in Road Networks.

IEEE transactions on knowledge and data engineering·2026
Same journal

Hierarchical Active Learning with Label Proportions on Data Regions.

IEEE transactions on knowledge and data engineering·2025
Same journal

Data Synthesis Reinvented: Preserving Missing Patterns for Enhanced Analysis.

IEEE transactions on knowledge and data engineering·2025
Same journal

Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity.

IEEE transactions on knowledge and data engineering·2025
Same journal

A Neural Database for Answering Aggregate Queries on Incomplete Relational Data.

IEEE transactions on knowledge and data engineering·2024
Same journal

Weakly Supervised Concept Map Generation through Task-Guided Graph Translation.

IEEE transactions on knowledge and data engineering·2024
See all related articles

Related Experiment Video

Updated: Jul 4, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.2K

HyperMinHash: MinHash in LogLog space.

Yun William Yu1, Griffin M Weber2

  • 1Department of Mathematics, University of Toronto, Toronto, Ontario, Canada M5S 2E4.

IEEE Transactions on Knowledge and Data Engineering
|January 30, 2024
PubMed
Summary
This summary is machine-generated.

We introduce HyperMinHash, a compressed MinHash sketch for efficient Jaccard index estimation. This method offers improved cardinality estimation in sub-logarithmic space, outperforming traditional MinHash.

Keywords:
compressionhyperloglogmin-wise hashingsketchingstreaming

More Related Videos

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems
07:41

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

7.5K
In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers
08:10

In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers

Published on: July 28, 2018

12.2K

Related Experiment Videos

Last Updated: Jul 4, 2025

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.2K
Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems
07:41

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

7.5K
In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers
08:10

In Vitro Reconstitution of Self-Organizing Protein Patterns on Supported Lipid Bilayers

Published on: July 28, 2018

12.2K

Area of Science:

  • Computer Science
  • Data Structures
  • Algorithms

Background:

  • MinHash is a widely used algorithm for Jaccard index estimation.
  • Existing compressed MinHash variants often sacrifice essential features like streaming updates or unions.
  • Sub-logarithmic space algorithms face challenges in maintaining accuracy and functionality.

Purpose of the Study:

  • To develop a novel lossy compression technique for MinHash sketches.
  • To introduce HyperMinHash, a compressed sketch compatible with HyperLogLog.
  • To retain MinHash's core functionalities while reducing memory footprint.

Main Methods:

  • Lossy compression of MinHash using floating-point notation.
  • Building the HyperMinHash sketch on a HyperLogLog scaffold.
  • Analyzing the space-time complexity and accuracy trade-offs.

Main Results:

  • HyperMinHash achieves additive approximation error on Jaccard index with space.
  • Estimates Jaccard indices of 0.01 for cardinalities with ~10% relative error using 2MiB.
  • Outperforms MinHash in estimating Jaccard indices for larger cardinalities within the same memory constraints.

Conclusions:

  • HyperMinHash provides a space-efficient alternative to MinHash for Jaccard index estimation.
  • It preserves essential features like streaming updates and unions, crucial for big data applications.
  • Enables accurate Jaccard index estimation for larger datasets with limited memory resources.