Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Pathway Representation via Intrinsic Structural Medoids (PRISM): A Structural Mapping Approach to Clustering Molecular Pathways.

bioRxiv : the preprint server for biology·2026

Same author

A New Family of Seniority-Restricted Coupled Cluster Methods.

The journal of physical chemistry. A·2026

Same author

Exploring New Construction Schemes for Extended-Hierarchy Configuration-Interaction Wave Functions.

The journal of physical chemistry. A·2026

Same author

Efficient exploration of peptide libraries using active learning with AlphaFold-based screening.

bioRxiv : the preprint server for biology·2026

Same author

Scaling <i>k</i>-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations.

Journal of chemical information and modeling·2026

Same author

Best practices to cluster large molecular libraries.

bioRxiv : the preprint server for biology·2026

Same journal

Correction to "AstraMEV (AI-Guided Structural Assembly of Multi-Epitope Vaccines) Against Infectious Bronchitis Virus".

Journal of chemical information and modeling·2026

Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026

Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026

Same journal

Structural and Thermodynamic Discrimination between Agonists and Antagonists of Retinoic Acid Receptor γ and the Vitamin D Receptor.

Journal of chemical information and modeling·2026

Same journal

PACEff Builder: An Efficient Platform for Constructing PACE Hybrid-Resolution Models for Molecular Dynamics Simulations of Aqueous Protein, Peptide Assembly, and Membrane Protein Systems.

Journal of chemical information and modeling·2026

Same journal

TransKla: A Local-Global Cross-Attention Based Transformer Approach for Prediction of Lysine Lactylation Sites.

Journal of chemical information and modeling·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 6, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

BitBIRCH Clustering Refinement Strategies.

Kenneth López Pérez¹, Kate Huddleston¹, Vicky Jung¹

¹Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32603, United States.

Journal of Chemical Information and Modeling

|May 27, 2025

Summary

This summary is machine-generated.

This study introduces an enhanced BitBIRCH algorithm for efficient clustering of massive chemical libraries. The improved method offers greater control over data partitioning without sacrificing speed, aiding chemical data analysis.

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Spatial Separation of Molecular Conformers and Clusters

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

Related Experiment Videos

Last Updated: May 6, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Spatial Separation of Molecular Conformers and Clusters

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

Area of Science:

Computational Chemistry
Chemoinformatics
Data Science

Background:

Chemical libraries are rapidly growing, necessitating advanced computational algorithms for efficient data processing.
Existing methods struggle to keep pace with the scale of modern chemical datasets.
The instant similarity (iSIM) framework and BitBIRCH algorithm were developed to address these challenges.

Purpose of the Study:

To present a new software package that expands upon the BitBIRCH algorithm for clustering large chemical datasets.
To provide users with enhanced control over the clustering tree structure and improve partition quality.
To maintain computational efficiency while increasing the accuracy of molecular clustering.

Main Methods:

Development of a dedicated software package for the BitBIRCH algorithm.
Implementation of user-adjustable parameters for controlling the tree structure.
Integration of new postprocessing tools for analyzing clustering results.
Utilizing n-ary similarity for rapid processing of large datasets.

Main Results:

The enhanced BitBIRCH package offers improved control over clustering parameters and enhances the quality of final partitions.
Computational efficiency is preserved, allowing for the clustering of billions of molecules.
New postprocessing tools facilitate detailed analysis of clustering outcomes.
The package demonstrates improved performance and user control compared to previous methods.

Conclusions:

The enhanced BitBIRCH package provides a powerful and efficient solution for clustering massive chemical libraries.
Users gain significant control over the clustering process, leading to higher quality data partitions.
This advancement supports the effective management and analysis of exploding chemical data.