Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Assessing model fit by cross-validation.

Douglas M Hawkins1, Subhash C Basak, Denise Mills

  • 1School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, USA. doug@stat.umn.edu

Journal of Chemical Information and Computer Sciences
|March 26, 2003
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Predicting mutagenicity of aromatic and heteroaromatic amine mutagens using a newly developed set of laplacian matrix-based graph invariants.

Acta chimica Slovenica·2026
Same author

Human Endothelial Membrane as a Structural Prototype: A Comparative Analysis with <i>Artemia salina</i> Endothelial-like Cell.

International journal of molecular sciences·2026
Same author

Chirobiophore: A Novel Framework for Quantifying Biochirality in Macromolecular Systems.

Biomolecules·2026
Same author

Precision Profile Weighted Deming Regression for Methods Comparison.

The journal of applied laboratory medicine·2026
Same author

Plasma pTau 217/β-amyloid 1-42 ratio for enhanced accuracy and reduced uncertainty in detecting amyloid pathology.

Brain : a journal of neurology·2026
Same author

Mucin 16-Directed Therapy in Pediatric Sarcomas: Case Evidence of Ubamatamab Efficacy in Epithelioid Sarcoma and Its Implications for Other Sarcoma Subtypes.

JCO precision oncology·2025
Same journal

Future Papers.

Journal of chemical information and computer sciences·2016
Same journal

Future Papers.

Journal of chemical information and computer sciences·2016
Same journal

Future Papers.

Journal of chemical information and computer sciences·2016
Same journal

Future Papers.

Journal of chemical information and computer sciences·2016
Same journal

Future Papers.

Journal of chemical information and computer sciences·2016
Same journal

Future Papers.

Journal of chemical information and computer sciences·2016
See all related articles

For small datasets in quantitative structure-activity relationship (QSAR) modeling, using cross-validation is more efficient than a hold-out test set. Proper implementation of cross-validation ensures reliable model validation when data is limited.

Area of Science:

  • Computational chemistry
  • Cheminformatics
  • Drug discovery

Background:

  • Quantitative structure-activity relationship (QSAR) models require rigorous validation to ensure predictive accuracy on new data.
  • Traditional validation methods include hold-out test sets and leave-one-out cross-validation (LOOCV).

Purpose of the Study:

  • To evaluate the efficiency and effectiveness of different QSAR model validation strategies for varying dataset sizes.
  • To provide guidance on optimal validation techniques for small to medium-sized QSAR datasets.

Main Methods:

  • Theoretical analysis of validation method resource requirements.
  • Empirical study using a large QSAR dataset to compare hold-out testing versus cross-validation.
  • Focus on the impact of sample size on validation efficacy.

Related Experiment Videos

Main Results:

  • Holding out a test set from small QSAR datasets (dozens to scores of compounds) is computationally wasteful and reduces model robustness.
  • Leave-one-out cross-validation, when implemented correctly, is a more statistically sound and efficient validation approach for smaller sample sizes.
  • Proper cross-validation ensures that the entire dataset contributes to both model fitting and validity assessment.

Conclusions:

  • For QSAR modeling with limited data, cross-validation is superior to hold-out validation.
  • Researchers should prioritize proper cross-validation techniques to maximize the reliability of QSAR models derived from small datasets.