Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.5K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

5.6K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
5.6K
Introduction to z Scores01:05

Introduction to z Scores

332
A z score (or standardized value) is measured in units of the standard deviation. It indicates how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a zero z score. It is important to note that the mean of the z scores is zero, and the standard deviation is one.
z scores...
332
Reliability and Validity01:29

Reliability and Validity

12.7K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
12.7K
Routh-Hurwitz Criterion II01:19

Routh-Hurwitz Criterion II

179
In the application of the Routh-Hurwitz criterion, two specific scenarios can arise that complicate stability analysis.
The first scenario occurs when a singular zero appears in the first column of the Routh table. This situation creates a division by zero issues. To resolve this, a small positive or negative number, denoted as epsilon (∈), is substituted for the zero. The stability analysis proceeds by assuming a sign for ∈. If ∈ is positive, any sign change in the first...
179
Confidence Coefficient01:24

Confidence Coefficient

7.5K
The confidence coefficient is also known as the confidence level or degree of confidence. It is the percent expression for the probability, 1-α, that the confidence interval contains the true population parameter assuming that the confidence interval is obtained after sufficient unbiased sampling; for example, if the CL = 90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter. Here α is the area under the curve, distributed equally under...
7.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026
Same author

ANARCII enables alignment-free antigen receptor numbering using a generalised language model.

Communications biology·2026
Same author

iNOS modulates inflammatory responses in an NO-independent manner through direct interaction with IRG1 in mitochondria.

Nature metabolism·2026
Same author

Ginkgo Datapoints Antibody Developability Competition outcomes: limited model performance and a call for data standardization.

mAbs·2026
Same author

LICHEN enables light-chain immunoglobulin sequence generation conditioned on the heavy chain and experimental needs.

Communications biology·2026
Same author

Characterising nanobody developability to improve therapeutic design using the Therapeutic Nanobody Profiler.

Communications biology·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: May 30, 2025

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

428

Robustly interrogating machine learning-based scoring functions: what are they learning?

Guy Durant1, Fergus Boyles1, Kristian Birchall2

  • 1Department of Statistics, University of Oxford, St Giles', Oxford OX1 3LB, United Kingdom.

Bioinformatics (Oxford, England)
|January 28, 2025
PubMed
Summary
This summary is machine-generated.

Machine learning scoring functions often learn dataset biases, not physics. Our study shows simple models match complex ones, highlighting the bias issue and offering a tool to test performance.

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

482

Related Experiment Videos

Last Updated: May 30, 2025

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

428
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

482

Area of Science:

  • Computational chemistry
  • Drug discovery
  • Machine learning

Background:

  • Machine learning-based scoring functions (MLBSFs) are crucial in drug discovery but often show inconsistent performance.
  • A key limitation is their tendency to learn dataset biases rather than generalizable physical principles.

Purpose of the Study:

  • To rigorously evaluate the performance of popular MLBSFs.
  • To investigate the extent to which MLBSFs learn dataset biases versus physical properties.
  • To provide a platform for robust performance interrogation of MLBSFs.

Main Methods:

  • Comparison of diverse MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, PointVS) against proposed baseline models.
  • Evaluation across a range of benchmarks to assess predictive accuracy.
  • Development and utilization of the ToolBoxSF platform for performance analysis.

Main Results:

  • Baseline models, designed to only learn dataset biases, achieved competitive accuracy against popular MLBSFs on most benchmarks.
  • This suggests that many current MLBSFs primarily capture dataset-specific artifacts.
  • The study provides evidence for the significant impact of dataset bias on MLBSF performance.

Conclusions:

  • The generalizability of current MLBSFs is questionable due to their susceptibility to dataset bias.
  • Researchers need to critically assess MLBSF performance and the influence of training data.
  • The ToolBoxSF platform offers a valuable resource for evaluating and improving MLBSFs.