Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Biostatistics: Overview

Biostatistics: Overview

Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Sustained Reduction in Cardiopulmonary Fitness in Long COVID: A Report from the RECOVER-adult Cohort Study.

JACC. Advances·2026

Same author

Metabolomic Profiles of Inflammation Associated With Incident Ischemic Stroke Risk in Women.

Neurology·2026

Same author

Associations of Combined Genetic and Lifestyle Risks With Incident Type 2 Diabetes in the UK Biobank.

Diabetes·2026

Same author

School Difficulties and Long COVID in Children and Adolescents.

Academic pediatrics·2026

Same author

A Randomized Trial of Vitamin D Supplementation and COVID-19 Clinical Outcomes and Long COVID: The Vitamin D for COVID-19 Trial.

The Journal of nutrition·2026

Same author

A plasma metabolomic fingerprint of moderate or severe hearing loss.

Metabolomics : Official journal of the Metabolomic Society·2026

Same journal

Interpretable SHAP-based machine learning framework for patient satisfaction prediction: a case study in Thammasat University Hospital.

BMC medical informatics and decision making·2026

Same journal

Automated generation of structured breast ultrasound reports using BreastViT and ChatGPT.

BMC medical informatics and decision making·2026

Same journal

Shared decision-making and medication adherence among community adults with chronic diseases: a cross-sectional study in Hubei Province, China.

BMC medical informatics and decision making·2026

Same journal

Classification of periapical radiographic findings for root canal therapy decision support using deep neural networks.

BMC medical informatics and decision making·2026

Same journal

Machine learning-based risk assessment of neonatal perinatal adverse outcomes of anemia during pregnancy: a modeling study.

BMC medical informatics and decision making·2026

Same journal

Intelligent differentiation between Parkinson's disease and essential tremor using wearable sensors and machine learning: a temporal validation study.

BMC medical informatics and decision making·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 9, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Bayesian variable selection for high dimensional predictors and self-reported outcomes.

Xiangdong Gu¹, Mahlet G Tadesse², Andrea S Foulkes³

¹Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, USA.

BMC Medical Informatics and Decision Making

|September 7, 2020

Summary

This summary is machine-generated.

This study introduces a new statistical method to improve variable selection for diseases like type 2 diabetes when outcomes are self-reported with errors. The approach enhances accuracy compared to methods ignoring reporting inaccuracies.

Keywords:

Bayesian variable selection High dimensional data Self-reports

More Related Videos

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Related Experiment Videos

Last Updated: Dec 9, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Area of Science:

Genetics
Biostatistics
Epidemiology

Background:

Silent diseases, such as type 2 diabetes, are often monitored using self-reported outcomes in large studies.
Self-reported data are cost-effective but prone to errors, potentially leading to misdiagnosis.
Accurate outcome ascertainment is crucial for identifying disease risk factors in high-dimensional data.

Purpose of the Study:

To develop and evaluate a statistical approach for variable selection in high-dimensional datasets with error-prone outcomes.
To adapt the spike and slab Bayesian Variable Selection method for self-reported, potentially inaccurate health data.
To identify genetic risk factors for type 2 diabetes in diverse populations while accounting for outcome measurement error.

Main Methods:

Adapted the spike and slab Bayesian Variable Selection algorithm for error-prone, self-reported outcomes.
Conducted simulation studies to assess the performance of the proposed method against a naive approach.
Applied the method to the Women's Health Initiative dataset, analyzing over 900,000 SNPs and phenotypic data from 9,873 women.

Main Results:

The proposed method demonstrated improved sensitivity in variable selection compared to approaches that ignore self-report error.
Identified several single nucleotide polymorphisms (SNPs) associated with type 2 diabetes risk in African American and Hispanic American women.
Observed limited overlap in top-ranking SNPs between racial groups, highlighting the need for race/ethnicity-specific genetic analyses.

Conclusions:

The adapted Bayesian variable selection algorithm improves accuracy when dealing with error-prone self-reported outcomes.
The findings underscore the importance of accounting for measurement error in epidemiological studies.
The developed R package and source code are available for broader application in genetic association studies.