Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Counting clusters using R-NN curves.

Rajarshi Guha1, Debojyoti Dutta, David J Wild

  • 1School of Informatics, Indiana University, Bloomington, Indiana 47406, USA. rguha@indiana.edu

Journal of Chemical Information and Modeling
|July 3, 2007
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Developing Predictive Models by Sharing Predictions - An Investigation of a Federated Learning Approach for ADMET Predictions.

Journal of medicinal chemistry·2026
Same author

Paths to cheminformatics: Q&A with Rajarshi Guha.

Journal of cheminformatics·2026
Same author

Enhanced transport behavior of small molecules in polymer solutions.

Soft matter·2025
Same author

Nonbonded Molecular Interaction Controls Aggregation Kinetics of Hydrophobic Molecules in Water.

Langmuir : the ACS journal of surfaces and colloids·2025
Same author

Computational drug repositioning identifies niclosamide and tribromsalan as inhibitors of Mycobacterium tuberculosis and Mycobacterium abscessus.

Tuberculosis (Edinburgh, Scotland)·2024
Same author

Are new ideas harder to find? A note on incremental research and Journal of Cheminformatics' Scientific Contribution Statement.

Journal of cheminformatics·2024
Same journal

QSAR in the Browser: An Interactive Cheminformatics Web Application.

Journal of chemical information and modeling·2026
Same journal

FoldDoF: Utilizing the Primary Degrees of Freedom of Protein Backbone for Geometric Modeling and Generation.

Journal of chemical information and modeling·2026
Same journal

Derisking Affinity Optimization for Macrocycles and Cyclic Peptides: High-Precision Free Energy Simulations across Five Diverse Targets.

Journal of chemical information and modeling·2026
Same journal

An End-User Audit of Reproducibility, Data Leakage, and Overfitting of the Top-Ranked ADMET Prediction Models in TDC Leaderboards.

Journal of chemical information and modeling·2026
Same journal

PFASGroups: An Open-Source Framework for Automated Identification, Structural Classification, and Prioritization of Per- and Polyfluoroalkyl Substances.

Journal of chemical information and modeling·2026
Same journal

DeepKbhb: Context-Aware Prediction of Human Lysine β-Hydroxybutyrylation Sites.

Journal of chemical information and modeling·2026
See all related articles

This study introduces the R-NN curve algorithm to determine the optimal number of clusters (k) for k-means clustering in cheminformatics. The R-NN curve method accurately estimates k, aligning with cluster quality measures.

Area of Science:

  • Cheminformatics
  • Computational Chemistry
  • Data Mining

Background:

  • Nonhierarchical clustering, like k-means, requires specifying the number of clusters (k).
  • Traditional methods involve iterative clustering with varying k values to find the optimum.
  • Determining the optimal k a priori is crucial for efficient and accurate clustering.

Purpose of the Study:

  • To introduce and evaluate the R-NN curve algorithm for a priori selection of k in clustering.
  • To assess the algorithm's ability to estimate the natural number of clusters.
  • To compare the R-NN curve algorithm's results with established cluster quality measures.

Main Methods:

  • Utilized the R-NN curve algorithm, based on nearest-neighbor analysis, to characterize compound spatial distributions.

Related Experiment Videos

  • Generated and analyzed R-NN curves to estimate the natural number of clusters.
  • Performed k-means clustering using the predicted k and compared results with average silhouette width.
  • Main Results:

    • The R-NN curve algorithm successfully determined the natural number of clusters for various datasets.
    • Results showed general agreement between the R-NN curve algorithm and average silhouette width in identifying optimal k.
    • The algorithm demonstrated effectiveness on both simulated and real chemical data.

    Conclusions:

    • The R-NN curve algorithm provides a reliable method for a priori determination of k in clustering.
    • This approach simplifies and enhances the efficiency of clustering in cheminformatics.
    • The R-NN curve algorithm is a valuable tool for selecting optimal cluster numbers, complementing existing quality metrics.