Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Drug Discovery: Overview

Drug Discovery: Overview

Drug discovery is a multifaceted process involving extensive screening, testing, and optimization of lead compounds to identify potential new drugs for therapeutic use. It combines several approaches, including screening large numbers of natural products, chemical modification of known active molecules, identification of new drug targets, and rational design based on biological mechanisms and drug-receptor structure. These approaches are carried out in both academic research laboratories and...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Correlation and Regression

Correlation and Regression

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Microsoft Excel: Regression Analysis

Microsoft Excel: Regression Analysis

Regression analysis in Microsoft Excel is a powerful statistical method for examining the relationship between a dependent variable and one or more independent variables. It's used extensively in fields such as economics, biology, and business to predict outcomes, understand relationships, and make data-driven decisions. The most common type is linear regression, which attempts to fit a straight line through the data points to model the relationship between variables.
To perform regression...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Enhancing the Small-Scale Screenable Biological Space beyond Known Chemogenomics Libraries with Gray Chemical Matter─Compounds with Novel Mechanisms from High-Throughput Screening Profiles.

ACS chemical biology·2024

Same author

Compound Activity Prediction with Dose-Dependent Transcriptomic Profiles and Deep Learning.

Journal of chemical information and modeling·2024

Same author

Clathrin light chain A-enriched small extracellular vesicles remodel microvascular niche to induce hepatocellular carcinoma metastasis.

Journal of extracellular vesicles·2023

Same author

Clathrin light chain A facilitates small extracellular vesicle uptake to promote hepatocellular carcinoma progression.

Hepatology international·2023

Same author

Step-by-Step Electrocrystallization Processes to Make Multiblock Magnetic Molecular Heterostructures.

Journal of the American Chemical Society·2023

Same author

Dihydroartemisinin engages liver fatty acid binding protein and suppresses metastatic hepatocellular carcinoma growth.

Chemical communications (Cambridge, England)·2023

Same journal

Coumarin-based small molecules for diabetes management: rational design, computational studies, synthesis, and biological evaluation.

Journal of computer-aided molecular design·2026

Same journal

Quantitative structure-activity relationship characterization and modeling of length-varying bioactive peptides.

Journal of computer-aided molecular design·2026

Same journal

Prot-ΔΔG: Prediction of protein-protein binding affinity changes upon mutations with pre-training strategies.

Journal of computer-aided molecular design·2026

Same journal

Computational exploration and in vitro validation of anti-quorum sensing potential of phytochemicals from Coleus amboinicus.

Journal of computer-aided molecular design·2026

Same journal

Exploring the HNE-inhibitory potential of in silico selected Moringa oleifera defense proteins.

Journal of computer-aided molecular design·2026

Same journal

Exploring Bakuchiol as an HSP90-Targeting Lead Against Triple-Negative Breast Cancer: Evidence from In Silico, In Vitro, and Synergy Studies.

Journal of computer-aided molecular design·2026

See all related articles

Search research articles

Home
Comparing Massively-multitask Regression Algorithms For Drug Discovery.

Home
Comparing Massively-multitask Regression Algorithms For Drug Discovery.

Related Experiment Video

Facilitating Drug Discovery: An Automated High-content Inflammation Assay in Zebrafish

Facilitating Drug Discovery: An Automated High-content Inflammation Assay in Zebrafish

Published on: July 16, 2012

Comparing massively-multitask regression algorithms for drug discovery.

Eric J Martin¹, Xiang-Wei Zhu², Patrick Riley³

¹Novartis Biomedical Research, Emeryville, CA, 94608, USA. eric.martin@novartis.com.

Journal of Computer-Aided Molecular Design

|February 4, 2026

View abstract on PubMed

Summary

This summary is machine-generated.

Massively-multitask regression models (MMRMs) significantly improve drug discovery activity prediction compared to single-task models. However, their performance is overestimated when using large test sets, highlighting the importance of realistic data splits for accurate evaluation.

Keywords:

Algorithm comparison Drug discovery Imputation Multitask regression QSAR Virtual screening

More Related Videos

Incorporating Target Protein Structure Flexibility and Dynamics in Computational Drug Discovery Using Ensemble-Based Docking Analysis

Incorporating Target Protein Structure Flexibility and Dynamics in Computational Drug Discovery Using Ensemble-Based Docking Analysis

Published on: June 20, 2025

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Published on: August 29, 2018

Related Experiment Videos

Facilitating Drug Discovery: An Automated High-content Inflammation Assay in Zebrafish

Facilitating Drug Discovery: An Automated High-content Inflammation Assay in Zebrafish

Published on: July 16, 2012

Incorporating Target Protein Structure Flexibility and Dynamics in Computational Drug Discovery Using Ensemble-Based Docking Analysis

Incorporating Target Protein Structure Flexibility and Dynamics in Computational Drug Discovery Using Ensemble-Based Docking Analysis

Published on: June 20, 2025

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Published on: August 29, 2018

Area of Science:

Computational Chemistry
Cheminformatics
Machine Learning in Drug Discovery

Background:

Massively-multitask regression models (MMRMs) have emerged as powerful tools for predicting compound bioactivity in drug discovery.
These models, trained on extensive datasets, offer accuracy comparable to experimental measurements.

Purpose of the Study:

To compare the performance of six leading MMRMs (pQSAR, Alchemite, MT-DNN, MetaNN, Macau, IMC) for bioactivity profile imputation.
To evaluate the impact of different training/test set splits on MMRM performance and accuracy estimation.

Main Methods:

Six MMRMs were trained by experts on identical datasets comprising 159 kinase and 4276 ChEMBL assays.
Models were evaluated using both 75/25 and 99+/ <1% training/test set splits to assess performance under varying data availability scenarios.

Comparative analysis included qualitative assessment and statistical rigor, benchmarking against single-task random forest regression (ST-RFR).

Main Results:

MMRMs significantly outperformed the ST-RFR model in bioactivity profile imputation.
Performance varied considerably based on the training/test split; 75/25 splits led to a substantial underestimation of model accuracy compared to 99+/ <1% splits.
While MMRMs excel at imputing profiles within the training data distribution, their advantage diminishes for compounds dissimilar to the training set.

Conclusions:

MMRMs are highly effective for tasks like hit-finding, off-target prediction, and drug repurposing within the chemical space of the training data.
The choice of data splitting strategy critically impacts the perceived accuracy of MMRMs, necessitating the use of realistic, smaller test sets for reliable evaluation.
MMRMs' utility is greatest for exploring known chemical spaces, while their performance on novel chemical entities requires further investigation.