Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Role of Shaping in Operant Conditioning01:19

Role of Shaping in Operant Conditioning

956
Shaping is a technique used in operant conditioning to train complex behaviors by rewarding successive approximations toward the target behavior. This method is necessary because organisms are unlikely to perform complex behaviors spontaneously. Instead, shaping breaks down the desired behavior into small, manageable steps.
The steps involved in shaping begin with reinforcing any response that resembles the desired behavior. For example, parents might praise a child for picking up one toy. As...
956
Classification of Signals01:30

Classification of Signals

1.3K
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
1.3K
Regression Analysis01:11

Regression Analysis

8.0K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
8.0K
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

14.3K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
14.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Structure-Based Discovery of Hsp90/HDAC6 Dual Inhibitors Targeting Aggressive Prostate Cancer.

Journal of medicinal chemistry·2025
Same author

Exploring Biological Targets of Magnolol and Honokiol and their Nature-Inspired Synthetic Derivatives: In Silico Identification and Experimental Validation of Estrogen Receptors.

Journal of natural products·2024
Same author

Searching for Novel HDAC6/Hsp90 Dual Inhibitors with Anti-Prostate Cancer Activity: In Silico Screening and In Vitro Evaluation.

Pharmaceuticals (Basel, Switzerland)·2024
Same author

Quantitative live cell imaging of a tauopathy model enables the identification of a polypharmacological drug candidate that restores physiological microtubule interaction.

Nature communications·2024
Same author

Early Diagnosis of Neurodegenerative Diseases: What Has Been Undertaken to Promote the Transition from PET to Fluorescence Tracers.

Molecules (Basel, Switzerland)·2024
Same author

Discovery of a Potent Dual Inhibitor of Aromatase and Aldosterone Synthase.

ACS pharmacology & translational science·2023
Same journal

PACEff Builder: An Efficient Platform for Constructing PACE Hybrid-Resolution Models for Molecular Dynamics Simulations of Aqueous Protein, Peptide Assembly, and Membrane Protein Systems.

Journal of chemical information and modeling·2026
Same journal

TransKla: A Local-Global Cross-Attention Based Transformer Approach for Prediction of Lysine Lactylation Sites.

Journal of chemical information and modeling·2026
Same journal

CondenSimAdapter: A Versatile Builder for Multiscale Simulations of Protein Condensates with Broad Force-Field Compatibility and Robust Dense-Phase Relaxation.

Journal of chemical information and modeling·2026
Same journal

Simulation Guided Design of a Potentially Hyperactive Ice Nucleating Protein.

Journal of chemical information and modeling·2026
Same journal

Setting the Bases of the Photogenotoxicity of <i>p</i>-Aminobenzoic Acid.

Journal of chemical information and modeling·2026
Same journal

Probing Charge-Controlled Inter-Domain Flexibility: Integrating Experimental and Coarse-Grained Approaches.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Video

Updated: Jan 14, 2026

Author Spotlight: IntelliSleepScorer &#8212; A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

969

Improving Machine Learning Classification Predictions through SHAP and Features Analysis Interpretation.

Leonardo Bernal1,2, Giulio Rastelli1, Luca Pinzi1

  • 1Department of Life Sciences, University of Modena and Reggio Emilia, Via Giuseppe Campi 103, 41125 Modena, Italy.

Journal of Chemical Information and Modeling
|October 20, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a novel method combining SHapley Additive Explanations (SHAP) with feature analysis to improve machine learning model accuracy in drug discovery. The approach effectively identifies and flags misclassified compounds, enhancing predictive performance for virtual screening.

Related Experiment Videos

Last Updated: Jan 14, 2026

Author Spotlight: IntelliSleepScorer &#8212; A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

969

Area of Science:

  • Computational chemistry and cheminformatics
  • Machine learning in drug discovery
  • Cancer research

Background:

  • Tree-based machine learning (ML) algorithms like Extra Trees (ET), Random Forest (RF), Gradient Boosting Machine (GBM), and XGBoost (XGB) are vital in early drug discovery.
  • These models often face challenges with misclassification and limited interpretability, hindering practical application.
  • SHapley Additive Explanations (SHAP) offers a way to understand feature importance and potentially improve model predictions.

Purpose of the Study:

  • To develop and validate a novel approach integrating SHAP values and feature analysis to reduce misclassification errors in ML models.
  • To benchmark the performance of ET, RF, GBM, and XGB algorithms using prostate cancer cell line data.
  • To create a misclassification-detection framework to improve the reliability of virtual screening predictions.

Main Methods:

  • Benchmarking of ET, RF, GBM, and XGB classifiers using RDKit and ECFP4 molecular descriptors.
  • Application of SHAP value analysis to understand prediction drivers and identify misclassified compounds.
  • Development and testing of four misclassification-detection filtering rules: RAW, SHAP, RAW OR SHAP, and RAW AND SHAP.

Main Results:

  • GBM and XGB models achieved high performance (MCC > 0.58, F1-score > 0.8) on antiproliferative activity data for PC3, LNCaP, and DU-145 cell lines.
  • SHAP analysis revealed that misclassified compounds often had feature values typical of the opposite class.
  • The 'RAW OR SHAP' rule successfully identified a significant percentage of misclassified compounds (up to 63% in LNCaP).

Conclusions:

  • The proposed integration of SHAP and feature analysis provides an effective strategy to detect and mitigate misclassifications in ML models.
  • The developed filtering rules enhance classifier performance by enabling the exclusion of likely erroneous predictions.
  • This approach offers a valuable tool for improving the accuracy and reliability of virtual screening in drug discovery.