Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Regression Toward the Mean01:52

Regression Toward the Mean

7.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
7.3K
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

531
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
531
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.8K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.8K
Variability: Analysis01:11

Variability: Analysis

609
Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...
609
Contingency Table01:29

Contingency Table

4.7K
A contingency table provides a way of portraying data that can facilitate calculating probabilities. It is a method of displaying a frequency distribution as a table with rows and columns to show how two variables may be dependent (contingent) upon each other; The table helps determine conditional probabilities quite quickly and can help systematically organize, analyze and quantify data. The table displays sample values concerning two variables that may be dependent or contingent on one...
4.7K
Variation01:19

Variation

8.2K
An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...
8.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Mapping the avoid-ome: a systematic open-science approach to predictive ADMET.

Nature communications·2026
Same author

The Open Molecular Software Foundation (OMSF) and the Growing Role of Open Source Software in Molecular Modeling.

Journal of chemical information and modeling·2026
Same author

A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data.

Journal of chemical information and modeling·2026
Same author

Blind Challenges Let Us See the Path Forward for Predictive Models.

Journal of chemical information and modeling·2026
Same author

Correction: Enhanced Thompson sampling by roulette wheel selection for screening ultralarge combinatorial libraries.

Journal of cheminformatics·2025
Same author

Enhanced Thompson sampling by roulette wheel selection for screening ultralarge combinatorial libraries.

Journal of cheminformatics·2025
Same journal

Sentiment Analysis of Acceptance TVET Online Courses on the Skill Academy App from Google Play: Leveraging Text Mining with Comparison Machine Learning Model.

F1000Research·2026
Same journal

Emotional intelligence: An important skill to learn now more than ever.

F1000Research·2026
Same journal

East Mediterranean Lineage of <i>Brucella melitensis</i> in Human Isolates and Milk Samples in Oman Using MLVA-14.

F1000Research·2026
Same journal

Application of K-Means Clustering for Job Applicant Analysis in Construction Firms Using R.

F1000Research·2026
Same journal

The influence of self-esteem and emotional intelligence on addiction to social networks in Peruvian university students.

F1000Research·2026
Same journal

A Bibliometric Analysis of Music's Role in Promoting Well-Being in Health Science Research.

F1000Research·2026
See all related articles

Related Experiment Video

Updated: Mar 12, 2026

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice
09:15

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice

Published on: February 4, 2015

28.6K

Understanding covariate shift in model performance.

Georgia McGaughey1, W Patrick Walters1, Brian Goldman1

  • 1Modeling & Informatics, Vertex Pharmaceuticals, Boston, MA, USA.

F1000Research
|November 5, 2016
PubMed
Summary
This summary is machine-generated.

Logistic regression and covariate shift methods outperformed k-nearest neighbors (k-NN) in dataset analysis. Reweighting training data with covariate shift showed no clear performance advantage in this study.

Keywords:
ChEMBLcovariate shiftk-NNlogistic regressionmodel building

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.4K
Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients
07:34

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

8.7K

Related Experiment Videos

Last Updated: Mar 12, 2026

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice
09:15

The Attentional Set Shifting Task: A Measure of Cognitive Flexibility in Mice

Published on: February 4, 2015

28.6K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.4K
Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients
07:34

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

8.7K

Area of Science:

  • Machine Learning
  • Data Science
  • Statistical Modeling

Background:

  • Dataset analysis often involves choosing appropriate machine learning methods.
  • Covariate shift can impact model performance when training and testing data distributions differ.
  • Evaluating different algorithms under various data conditions is crucial for robust predictions.

Purpose of the Study:

  • To compare the performance of logistic regression, covariate shift, and k-nearest neighbors (k-NN) algorithms.
  • To assess the effectiveness of covariate shift for reweighting training data.
  • To evaluate algorithm performance on both internal and external datasets, including those with covariate shift.

Main Methods:

  • Applied logistic regression, covariate shift, and k-nearest neighbors (k-NN) algorithms.
  • Utilized five internal datasets and one external public dataset.
  • Analyzed datasets exhibiting covariate shift.

Main Results:

  • k-NN performance was consistently inferior to both logistic regression and covariate shift.
  • Logistic regression and covariate shift demonstrated comparable performance.
  • No significant performance improvement was observed by using covariate shift for reweighting training data across the examined datasets.

Conclusions:

  • Logistic regression and covariate shift are more effective than k-NN for the tested datasets.
  • The benefit of covariate shift reweighting may be dataset-dependent and requires further investigation.
  • Algorithm choice significantly impacts performance, especially in the presence of covariate shift.