Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Goodness-of-Fit Test01:16

Goodness-of-Fit Test

3.3K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
3.3K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.5K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.5K
Regression Toward the Mean01:52

Regression Toward the Mean

6.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.3K
Improving Translational Accuracy02:07

Improving Translational Accuracy

9.0K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
9.0K
Multiple Regression01:25

Multiple Regression

2.9K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
2.9K
Prediction Intervals01:03

Prediction Intervals

2.2K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

SangsterLogP - the largest publicly available dataset of logP values.

Scientific data·2026
Same author

2nd EUOS/SLAS joint challenge: Prediction of spectral properties of compounds.

SLAS technology·2025
Same author

Introducing the Inaugural Early Career Board for <i>Chemical Research in Toxicology</i>.

Chemical research in toxicology·2025
Same author

Advanced machine learning for innovative drug discovery.

Journal of cheminformatics·2025
Same author

Advancing Human and Environmental Safety Science Using <i>In Silico</i> Methods.

Chemical research in toxicology·2025
Same author

Which Modern AI Methods Provide Accurate Predictions of Toxicological End Points? Analysis of Tox24 Challenge Results.

Chemical research in toxicology·2025
Same journal

OpenStats: how to combine statistics and research data management (RDM) to leverage efficient scientific data analysis by guided statistics.

Journal of cheminformatics·2026
Same journal

Unified heterogeneity-aware benchmark of drug synergy prediction: a cross-study analysis of traditional machine learning and graph deep learning models.

Journal of cheminformatics·2026
Same journal

Count your bits: fingerprint benchmarking to assess broad chemical space representation.

Journal of cheminformatics·2026
Same journal

Sampling out-of-distribution chemical spaces via Bayesian flow.

Journal of cheminformatics·2026
Same journal

Hold on tight: the kinetic profiling of opioid receptor ligands using the CORAL-MD.

Journal of cheminformatics·2026
Same journal

Transformer-accelerated discovery of inhibitors targeting the RpsA<sub>Δ438</sub> deletion in PZA-resistant tuberculosis.

Journal of cheminformatics·2026
See all related articles

Related Experiment Video

Updated: Jun 5, 2025

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis
07:11

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis

Published on: August 19, 2021

2.4K

Be aware of overfitting by hyperparameter optimization!

Igor V Tetko1,2, Ruud van Deursen3, Guillaume Godin4

  • 1Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - Deutsches Forschungszentrum Für Gesundheit Und Umwelt (GmbH), 86764, Neuherberg, Germany. igor.tetko@helmholtz-munich.de.

Journal of Cheminformatics
|December 9, 2024
PubMed
Summary
This summary is machine-generated.

Hyperparameter optimization in machine learning may cause overfitting. Using pre-set hyperparameters offers similar results, significantly reducing computational time and improving model accuracy with Transformer CNN.

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
Author Spotlight: Advancing Alzheimer's Research &#8211; Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

954

Related Experiment Videos

Last Updated: Jun 5, 2025

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis
07:11

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis

Published on: August 19, 2021

2.4K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
Author Spotlight: Advancing Alzheimer's Research &#8211; Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

954

Area of Science:

  • Computational Chemistry
  • Machine Learning
  • Drug Discovery

Background:

  • Hyperparameter optimization is common in machine learning for tasks like solubility prediction.
  • Previous studies utilized graph-based methods on diverse solubility datasets.
  • Concerns exist regarding potential overfitting during extensive hyperparameter tuning.

Purpose of the Study:

  • To investigate the impact of hyperparameter optimization on model performance in solubility prediction.
  • To compare the efficiency and accuracy of pre-set hyperparameters versus optimized ones.
  • To evaluate a novel Natural Language Processing-based representation learning method, Transformer CNN.

Main Methods:

  • Analysis of seven thermodynamic and kinetic solubility datasets.
  • Comparison of state-of-the-art graph-based methods with hyperparameter optimization and pre-set hyperparameters.
  • Implementation and evaluation of Transformer CNN, a Natural Language Processing approach using SMILES strings.

Main Results:

  • Hyperparameter optimization did not consistently improve model performance and could lead to overfitting.
  • Models with pre-set hyperparameters achieved comparable results to optimized models, reducing computational cost by approximately 10,000 times.
  • Transformer CNN outperformed graph-based methods in 26 out of 28 comparisons, demonstrating superior accuracy and efficiency.

Conclusions:

  • Pre-optimized hyperparameters can negatively impact model generalization due to overfitting.
  • Utilizing pre-set hyperparameters is a computationally efficient strategy yielding comparable predictive performance.
  • Transformer CNN represents a significant advancement in solubility prediction accuracy and speed.