Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Contingency Table

Contingency Table

A contingency table provides a way of portraying data that can facilitate calculating probabilities. It is a method of displaying a frequency distribution as a table with rows and columns to show how two variables may be dependent (contingent) upon each other; The table helps determine conditional probabilities quite quickly and can help systematically organize, analyze and quantify data. The table displays sample values concerning two variables that may be dependent or contingent on one...

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Mapping the avoid-ome: a systematic open-science approach to predictive ADMET.

Nature communications·2026

Same author

The Open Molecular Software Foundation (OMSF) and the Growing Role of Open Source Software in Molecular Modeling.

Journal of chemical information and modeling·2026

Same author

A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data.

Journal of chemical information and modeling·2026

Same author

Blind Challenges Let Us See the Path Forward for Predictive Models.

Journal of chemical information and modeling·2026

Same author

Correction: Enhanced Thompson sampling by roulette wheel selection for screening ultralarge combinatorial libraries.

Journal of cheminformatics·2025

Same author

Enhanced Thompson sampling by roulette wheel selection for screening ultralarge combinatorial libraries.

Journal of cheminformatics·2025

Same journal

Sentiment Analysis of Acceptance TVET Online Courses on the Skill Academy App from Google Play: Leveraging Text Mining with Comparison Machine Learning Model.

F1000Research·2026

Same journal

Emotional intelligence: An important skill to learn now more than ever.

F1000Research·2026

Same journal

East Mediterranean Lineage of <i>Brucella melitensis</i> in Human Isolates and Milk Samples in Oman Using MLVA-14.

F1000Research·2026

Same journal

Application of K-Means Clustering for Job Applicant Analysis in Construction Firms Using R.

F1000Research·2026

Same journal

The influence of self-esteem and emotional intelligence on addiction to social networks in Peruvian university students.

F1000Research·2026

Same journal

A Bibliometric Analysis of Music's Role in Promoting Well-Being in Health Science Research.

F1000Research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 12, 2026

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice

Published on: February 4, 2015

Understanding covariate shift in model performance.

Georgia McGaughey¹, W Patrick Walters¹, Brian Goldman¹

¹Modeling & Informatics, Vertex Pharmaceuticals, Boston, MA, USA.

|November 5, 2016

Summary

This summary is machine-generated.

Logistic regression and covariate shift methods outperformed k-nearest neighbors (k-NN) in dataset analysis. Reweighting training data with covariate shift showed no clear performance advantage in this study.

Keywords:

ChEMBL covariate shift k-NN logistic regression model building

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Related Experiment Videos

Last Updated: Mar 12, 2026

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice

Published on: February 4, 2015

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

Area of Science:

Machine Learning
Data Science
Statistical Modeling

Background:

Dataset analysis often involves choosing appropriate machine learning methods.
Covariate shift can impact model performance when training and testing data distributions differ.
Evaluating different algorithms under various data conditions is crucial for robust predictions.

Purpose of the Study:

To compare the performance of logistic regression, covariate shift, and k-nearest neighbors (k-NN) algorithms.
To assess the effectiveness of covariate shift for reweighting training data.
To evaluate algorithm performance on both internal and external datasets, including those with covariate shift.

Main Methods:

Applied logistic regression, covariate shift, and k-nearest neighbors (k-NN) algorithms.
Utilized five internal datasets and one external public dataset.
Analyzed datasets exhibiting covariate shift.

Main Results:

k-NN performance was consistently inferior to both logistic regression and covariate shift.
Logistic regression and covariate shift demonstrated comparable performance.
No significant performance improvement was observed by using covariate shift for reweighting training data across the examined datasets.

Conclusions:

Logistic regression and covariate shift are more effective than k-NN for the tested datasets.
The benefit of covariate shift reweighting may be dataset-dependent and requires further investigation.
Algorithm choice significantly impacts performance, especially in the presence of covariate shift.