Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Introduction to z Scores

Introduction to z Scores

A z score (or standardized value) is measured in units of the standard deviation. It indicates how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a zero z score. It is important to note that the mean of the z scores is zero, and the standard deviation is one.
z scores...

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Routh-Hurwitz Criterion II

Routh-Hurwitz Criterion II

In the application of the Routh-Hurwitz criterion, two specific scenarios can arise that complicate stability analysis.
The first scenario occurs when a singular zero appears in the first column of the Routh table. This situation creates a division by zero issues. To resolve this, a small positive or negative number, denoted as epsilon (∈), is substituted for the zero. The stability analysis proceeds by assuming a sign for ∈. If ∈ is positive, any sign change in the first...

Confidence Coefficient

Confidence Coefficient

The confidence coefficient is also known as the confidence level or degree of confidence. It is the percent expression for the probability, 1-α, that the confidence interval contains the true population parameter assuming that the confidence interval is obtained after sufficient unbiased sampling; for example, if the CL = 90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter. Here α is the area under the curve, distributed equally under...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026

Same author

ANARCII enables alignment-free antigen receptor numbering using a generalised language model.

Communications biology·2026

Same author

iNOS modulates inflammatory responses in an NO-independent manner through direct interaction with IRG1 in mitochondria.

Nature metabolism·2026

Same author

Ginkgo Datapoints Antibody Developability Competition outcomes: limited model performance and a call for data standardization.

mAbs·2026

Same author

LICHEN enables light-chain immunoglobulin sequence generation conditioned on the heavy chain and experimental needs.

Communications biology·2026

Same author

Characterising nanobody developability to improve therapeutic design using the Therapeutic Nanobody Profiler.

Communications biology·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 30, 2025

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

Robustly interrogating machine learning-based scoring functions: what are they learning?

Guy Durant¹, Fergus Boyles¹, Kristian Birchall²

¹Department of Statistics, University of Oxford, St Giles', Oxford OX1 3LB, United Kingdom.

Bioinformatics (Oxford, England)

|January 28, 2025

Summary

This summary is machine-generated.

Machine learning scoring functions often learn dataset biases, not physics. Our study shows simple models match complex ones, highlighting the bias issue and offering a tool to test performance.

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: May 30, 2025

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Computational chemistry
Drug discovery
Machine learning

Background:

Machine learning-based scoring functions (MLBSFs) are crucial in drug discovery but often show inconsistent performance.
A key limitation is their tendency to learn dataset biases rather than generalizable physical principles.

Purpose of the Study:

To rigorously evaluate the performance of popular MLBSFs.
To investigate the extent to which MLBSFs learn dataset biases versus physical properties.
To provide a platform for robust performance interrogation of MLBSFs.

Main Methods:

Comparison of diverse MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, PointVS) against proposed baseline models.
Evaluation across a range of benchmarks to assess predictive accuracy.
Development and utilization of the ToolBoxSF platform for performance analysis.

Main Results:

Baseline models, designed to only learn dataset biases, achieved competitive accuracy against popular MLBSFs on most benchmarks.
This suggests that many current MLBSFs primarily capture dataset-specific artifacts.
The study provides evidence for the significant impact of dataset bias on MLBSF performance.

Conclusions:

The generalizability of current MLBSFs is questionable due to their susceptibility to dataset bias.
Researchers need to critically assess MLBSF performance and the influence of training data.
The ToolBoxSF platform offers a valuable resource for evaluating and improving MLBSFs.