Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

11.1K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
11.1K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.0K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.0K
Sampling Plans01:23

Sampling Plans

1.5K
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
1.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Pathway Representation via Intrinsic Structural Medoids (PRISM): A Structural Mapping Approach to Clustering Molecular Pathways.

bioRxiv : the preprint server for biology·2026
Same author

A New Family of Seniority-Restricted Coupled Cluster Methods.

The journal of physical chemistry. A·2026
Same author

Exploring New Construction Schemes for Extended-Hierarchy Configuration-Interaction Wave Functions.

The journal of physical chemistry. A·2026
Same author

Efficient exploration of peptide libraries using active learning with AlphaFold-based screening.

bioRxiv : the preprint server for biology·2026
Same author

Scaling <i>k</i>-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations.

Journal of chemical information and modeling·2026
Same author

Best practices to cluster large molecular libraries.

bioRxiv : the preprint server for biology·2026
Same journal

Correction to "AstraMEV (AI-Guided Structural Assembly of Multi-Epitope Vaccines) Against Infectious Bronchitis Virus".

Journal of chemical information and modeling·2026
Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026
Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026
Same journal

Structural and Thermodynamic Discrimination between Agonists and Antagonists of Retinoic Acid Receptor γ and the Vitamin D Receptor.

Journal of chemical information and modeling·2026
Same journal

PACEff Builder: An Efficient Platform for Constructing PACE Hybrid-Resolution Models for Molecular Dynamics Simulations of Aqueous Protein, Peptide Assembly, and Membrane Protein Systems.

Journal of chemical information and modeling·2026
Same journal

TransKla: A Local-Global Cross-Attention Based Transformer Approach for Prediction of Lysine Lactylation Sites.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Video

Updated: May 6, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

6.3K

BitBIRCH Clustering Refinement Strategies.

Kenneth López Pérez1, Kate Huddleston1, Vicky Jung1

  • 1Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32603, United States.

Journal of Chemical Information and Modeling
|May 27, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces an enhanced BitBIRCH algorithm for efficient clustering of massive chemical libraries. The improved method offers greater control over data partitioning without sacrificing speed, aiding chemical data analysis.

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.2K
Spatial Separation of Molecular Conformers and Clusters
10:37

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

11.0K

Related Experiment Videos

Last Updated: May 6, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

6.3K
ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.2K
Spatial Separation of Molecular Conformers and Clusters
10:37

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

11.0K

Area of Science:

  • Computational Chemistry
  • Chemoinformatics
  • Data Science

Background:

  • Chemical libraries are rapidly growing, necessitating advanced computational algorithms for efficient data processing.
  • Existing methods struggle to keep pace with the scale of modern chemical datasets.
  • The instant similarity (iSIM) framework and BitBIRCH algorithm were developed to address these challenges.

Purpose of the Study:

  • To present a new software package that expands upon the BitBIRCH algorithm for clustering large chemical datasets.
  • To provide users with enhanced control over the clustering tree structure and improve partition quality.
  • To maintain computational efficiency while increasing the accuracy of molecular clustering.

Main Methods:

  • Development of a dedicated software package for the BitBIRCH algorithm.
  • Implementation of user-adjustable parameters for controlling the tree structure.
  • Integration of new postprocessing tools for analyzing clustering results.
  • Utilizing n-ary similarity for rapid processing of large datasets.

Main Results:

  • The enhanced BitBIRCH package offers improved control over clustering parameters and enhances the quality of final partitions.
  • Computational efficiency is preserved, allowing for the clustering of billions of molecules.
  • New postprocessing tools facilitate detailed analysis of clustering outcomes.
  • The package demonstrates improved performance and user control compared to previous methods.

Conclusions:

  • The enhanced BitBIRCH package provides a powerful and efficient solution for clustering massive chemical libraries.
  • Users gain significant control over the clustering process, leading to higher quality data partitions.
  • This advancement supports the effective management and analysis of exploding chemical data.