Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Biostatistics: Overview01:20

Biostatistics: Overview

591
Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...
591
Multiple Regression01:25

Multiple Regression

3.6K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.6K
Survival Tree01:19

Survival Tree

299
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
299
Variation01:19

Variation

7.6K
An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...
7.6K
Regression Toward the Mean01:52

Regression Toward the Mean

6.7K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.7K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.9K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Sustained Reduction in Cardiopulmonary Fitness in Long COVID: A Report from the RECOVER-adult Cohort Study.

JACC. Advances·2026
Same author

Metabolomic Profiles of Inflammation Associated With Incident Ischemic Stroke Risk in Women.

Neurology·2026
Same author

Associations of Combined Genetic and Lifestyle Risks With Incident Type 2 Diabetes in the UK Biobank.

Diabetes·2026
Same author

School Difficulties and Long COVID in Children and Adolescents.

Academic pediatrics·2026
Same author

A Randomized Trial of Vitamin D Supplementation and COVID-19 Clinical Outcomes and Long COVID: The Vitamin D for COVID-19 Trial.

The Journal of nutrition·2026
Same author

A plasma metabolomic fingerprint of moderate or severe hearing loss.

Metabolomics : Official journal of the Metabolomic Society·2026
Same journal

Interpretable SHAP-based machine learning framework for patient satisfaction prediction: a case study in Thammasat University Hospital.

BMC medical informatics and decision making·2026
Same journal

Automated generation of structured breast ultrasound reports using BreastViT and ChatGPT.

BMC medical informatics and decision making·2026
Same journal

Shared decision-making and medication adherence among community adults with chronic diseases: a cross-sectional study in Hubei Province, China.

BMC medical informatics and decision making·2026
Same journal

Classification of periapical radiographic findings for root canal therapy decision support using deep neural networks.

BMC medical informatics and decision making·2026
Same journal

Machine learning-based risk assessment of neonatal perinatal adverse outcomes of anemia during pregnancy: a modeling study.

BMC medical informatics and decision making·2026
Same journal

Intelligent differentiation between Parkinson's disease and essential tremor using wearable sensors and machine learning: a temporal validation study.

BMC medical informatics and decision making·2026
See all related articles

Related Experiment Video

Updated: Dec 9, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K

Bayesian variable selection for high dimensional predictors and self-reported outcomes.

Xiangdong Gu1, Mahlet G Tadesse2, Andrea S Foulkes3

  • 1Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, USA.

BMC Medical Informatics and Decision Making
|September 7, 2020
PubMed
Summary
This summary is machine-generated.

This study introduces a new statistical method to improve variable selection for diseases like type 2 diabetes when outcomes are self-reported with errors. The approach enhances accuracy compared to methods ignoring reporting inaccuracies.

Keywords:
Bayesian variable selectionHigh dimensional dataSelf-reports

More Related Videos

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.3K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.6K

Related Experiment Videos

Last Updated: Dec 9, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K
Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.3K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.6K

Area of Science:

  • Genetics
  • Biostatistics
  • Epidemiology

Background:

  • Silent diseases, such as type 2 diabetes, are often monitored using self-reported outcomes in large studies.
  • Self-reported data are cost-effective but prone to errors, potentially leading to misdiagnosis.
  • Accurate outcome ascertainment is crucial for identifying disease risk factors in high-dimensional data.

Purpose of the Study:

  • To develop and evaluate a statistical approach for variable selection in high-dimensional datasets with error-prone outcomes.
  • To adapt the spike and slab Bayesian Variable Selection method for self-reported, potentially inaccurate health data.
  • To identify genetic risk factors for type 2 diabetes in diverse populations while accounting for outcome measurement error.

Main Methods:

  • Adapted the spike and slab Bayesian Variable Selection algorithm for error-prone, self-reported outcomes.
  • Conducted simulation studies to assess the performance of the proposed method against a naive approach.
  • Applied the method to the Women's Health Initiative dataset, analyzing over 900,000 SNPs and phenotypic data from 9,873 women.

Main Results:

  • The proposed method demonstrated improved sensitivity in variable selection compared to approaches that ignore self-report error.
  • Identified several single nucleotide polymorphisms (SNPs) associated with type 2 diabetes risk in African American and Hispanic American women.
  • Observed limited overlap in top-ranking SNPs between racial groups, highlighting the need for race/ethnicity-specific genetic analyses.

Conclusions:

  • The adapted Bayesian variable selection algorithm improves accuracy when dealing with error-prone self-reported outcomes.
  • The findings underscore the importance of accounting for measurement error in epidemiological studies.
  • The developed R package and source code are available for broader application in genetic association studies.