Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

SangsterLogP - the largest publicly available dataset of logP values.

Scientific data·2026

Same author

2nd EUOS/SLAS joint challenge: Prediction of spectral properties of compounds.

SLAS technology·2025

Same author

Introducing the Inaugural Early Career Board for <i>Chemical Research in Toxicology</i>.

Chemical research in toxicology·2025

Same author

Advanced machine learning for innovative drug discovery.

Journal of cheminformatics·2025

Same author

Advancing Human and Environmental Safety Science Using <i>In Silico</i> Methods.

Chemical research in toxicology·2025

Same author

Which Modern AI Methods Provide Accurate Predictions of Toxicological End Points? Analysis of Tox24 Challenge Results.

Chemical research in toxicology·2025

Same journal

Computational design of low-volatility lubricants for space using interpretable machine learning.

Journal of cheminformatics·2026

Same journal

OpenStats: how to combine statistics and research data management (RDM) to leverage efficient scientific data analysis by guided statistics.

Journal of cheminformatics·2026

Same journal

Unified heterogeneity-aware benchmark of drug synergy prediction: a cross-study analysis of traditional machine learning and graph deep learning models.

Journal of cheminformatics·2026

Same journal

Count your bits: fingerprint benchmarking to assess broad chemical space representation.

Journal of cheminformatics·2026

Same journal

Sampling out-of-distribution chemical spaces via Bayesian flow.

Journal of cheminformatics·2026

Same journal

Hold on tight: the kinetic profiling of opioid receptor ligands using the CORAL-MD.

Journal of cheminformatics·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jun 5, 2025

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis

Published on: August 19, 2021

注意通过超参数优化进行过拟合!

Igor V Tetko^1,2, Ruud van Deursen³, Guillaume Godin⁴

¹Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - Deutsches Forschungszentrum Für Gesundheit Und Umwelt (GmbH), 86764, Neuherberg, Germany. igor.tetko@helmholtz-munich.de.

Journal of cheminformatics

|December 9, 2024

概括

此摘要是机器生成的。

机器学习中的超参数优化可能会导致过拟合. 使用预设的超参数提供了类似的结果,大大减少了计算时间,并提高了模型准确性.

更多相关视频

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

相关实验视频

Last Updated: Jun 5, 2025

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis

ARL Spectral Fitting as an Application to Augment Spectral Data via Franck-Condon Lineshape Analysis and Color Analysis

Published on: August 19, 2021

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

科学领域:

计算化学的计算化学
机器学习机器学习
药物发现药物发现药物发现

背景情况:

超参数优化在机器学习中常见,用于诸如可溶性预测等任务.
之前的研究使用了基于图的方法,对各种可溶性数据集进行了分析.
人们担心在广泛的超参数调整过程中可能会出现过度装配.

研究的目的:

研究超参数优化对溶解性预测中的模型性能的影响.
为了比较预设的超参数与优化的超参数的效率和准确性.
为了评估一种新的基于自然语言处理 (Natural Language Processing) 的表示学习方法,Transformer CNN.

主要方法:

对七个热力学和动力学可溶性数据集的分析.
基于图形的最新方法与超参数优化和预设超参数的比较.
实现和评估变换器CNN,一种使用SMILES字符串的自然语言处理方法.

主要成果:

超参数优化并没有持续改善模型性能,可能导致过拟合.
具有预设超参数的模型实现了与优化模型相比的结果,将计算成本降低了大约1万倍.
变压器CNN在28次比较中26次超越了基于图表的方法,证明了卓越的准确性和效率.

结论:

预先优化的超参数可能会因为过度拟合而对模型概括产生负面影响.
使用预设的超参数是一种计算效率高的策略,可以产生可比的预测性能.
变压器CNN代表了可溶性预测准确性和速度的重大进步.