Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a survival tree begins...
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
Unrealistic Optimism Bias01:30

Unrealistic Optimism Bias

Unrealistic optimism bias is the tendency to overestimate the likelihood of positive outcomes. This cognitive bias makes individuals believe they are less likely to experience failures, setbacks, or risks and more likely to succeed than others. For example, people may assume they are less prone to health issues, accidents, or financial struggles than their peers, even when they share similar risk factors.One key component of this bias is the above-average effect, where individuals perceive...
Prediction Intervals01:03

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
The...
Regression Toward the Mean01:52

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when researchers try to extrapolate results...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Exploring Important Features in Continuous Spectral Datasets Using Supervised Learning.

Analytical chemistry·2026
Same author

Design and Synthesis of New Pyrazolo[1,5-c]quinazolin-5-amines as Potential TLR7/8 Modulators.

ChemMedChem·2026
Same author

Combinatorial discovery of microtopographical landscapes that resist biofilm formation through quorum sensing mediated autolubrication.

Nature communications·2025
Same author

Self-Organizing Maps for Secondary Ion Mass Spectrometry.

Journal of the American Society for Mass Spectrometry·2024
Same author

Exploring the Performance of Linear and Nonlinear Models of Time-of-Flight Secondary Ion Mass Spectrometry Spectra.

Analytical chemistry·2024
Same author

Identifying factors controlling cellular uptake of gold nanoparticles by machine learning.

Journal of drug targeting·2023
Same journal

Probing Charge-Controlled Inter-Domain Flexibility: Integrating Experimental and Coarse-Grained Approaches.

Journal of chemical information and modeling·2026
Same journal

FragScan: A Quantitative Fragment Scanning Strategy for Rational Drug Discovery.

Journal of chemical information and modeling·2026
Same journal

GeoPep: A Geometry-Aware Masked Language Model for Protein-Peptide Binding Site Prediction.

Journal of chemical information and modeling·2026
Same journal

Interaction Persistence-Based Identification of Key Binding Residues in the Cellular Retinol-Binding Protein 1 Complex.

Journal of chemical information and modeling·2026
Same journal

Tree-Guided Graph Neural Networks with Multilevel Optimization for Protein-Protein Interaction Prediction.

Journal of chemical information and modeling·2026
Same journal

ASO-RASAR: A Read-Across Framework for Predicting Antisense Oligonucleotide Gapmer Activity Across Target Genes.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Videos

Are We Underestimating Overfitting?

David A Winkler1,2,3

  • 1Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria3086, Australia.

Journal of Chemical Information and Modeling
|May 28, 2026
PubMed
Summary
This summary is machine-generated.

Overly complex machine learning models, previously thought to overfit and perform poorly, can actually improve predictions. These overparameterized models capture additional information, enhancing their accuracy on new data.

Related Experiment Videos

Area of Science:

  • Computational chemistry
  • Machine learning
  • Quantitative structure-activity relationships (QSAR)

Background:

  • Traditional Quantitative Structure-Activity Relationship (QSAR) modeling emphasizes parsimonious models to avoid overfitting.
  • Overfitting, where models perform well on training data but poorly on new data, is a key concern in model development.

Purpose of the Study:

  • To challenge the dogma that parsimonious models are always superior in Quantitative Structure-Activity Relationship (QSAR) and Quantitative Structure-Property Relationship (QSPR) modeling.
  • To explore the potential of overparameterized machine learning models to accurately predict external data.

Main Methods:

  • Review of recent publications on overfitting and overparameterization in machine learning.
  • Analysis of information theoretic arguments supporting the predictive power of overparameterized models.
  • Modeling of synthetic and real data to evaluate the performance of potentially overfitted models on test sets.

Main Results:

  • Formally overparameterized machine learning models can exhibit strong predictive accuracy on external test data.
  • Supernumerary model parameters may contain valuable information that improves predictions for unseen data.
  • Illustrative examples demonstrate the effectiveness of overfitted models in QSAR and QSPR.

Conclusions:

  • The understanding of overfitting and overparameterization in machine learning has significant implications for QSAR and QSPR modeling.
  • Overparameterized models offer a counterintuitive but effective approach to enhance predictive accuracy in cheminformatics.
  • Embracing potentially overfitted models can lead to more robust and accurate structure-activity and structure-property relationship predictions.