Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

Hindsight Biases

Hindsight Biases

Hindsight bias leads you to believe that the event you just experienced was predictable, even though it really wasn’t. In other words, you knew all along that things would turn out the way they did. Can you relate this to the phrase "Hindsight is 20/20" now?

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Bayesian Machine Learning Tools for Alcohol Use Disorder Research: The bpaup R Package.

Multivariate behavioral research·2026

Same author

Serum Endogenous Opioid Levels are Associated with Self-Injury Severity in Adolescents with Non-Suicidal Self-Injury and Comorbid Depression.

Neuroscience bulletin·2026

Same author

Prognostic Impact of <i>KRAS</i> and <i>SMARCA4</i> Mutations and Co-Mutations on Survival in Non-Small Cell Lung Cancer: Insights from the AACR GENIE BPC Dataset.

Biomedicines·2025

Same author

Improving thermostability of α-L-fucosidase from Pedobacter sp. via consensus-guided engineering and directed evolution.

Journal of biotechnology·2025

Same author

Intergenerational Associations Between Maternal Diet and Childhood Adiposity: A Bayesian Regularized Mediation Analysis.

Statistics in biosciences·2025

Same author

Aldolase A accelerates hepatocarcinogenesis by refactoring c-Jun transcription.

Journal of pharmaceutical analysis·2025

Same journal

A joint model for a longitudinal outcome and a progressive multistate model under a mixed observation scheme.

Statistical methods in medical research·2026

Same journal

Efficient semi-supervised estimation of optimal individualized treatment regimes with survival outcome.

Statistical methods in medical research·2026

Same journal

Asymptotic online FWER control for dependent test statistics.

Statistical methods in medical research·2026

Same journal

Regression analysis of misclassified current status data with potentially unknown test accuracy.

Statistical methods in medical research·2026

Same journal

Bayesian multivariate linear mixed-effects models with varied association structures.

Statistical methods in medical research·2026

Same journal

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 13, 2025

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

SIGHR: Side information guided high-dimensional regression.

Yuan Yang¹, Christopher S McMahan¹, Yu-Bo Wang¹

¹School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, USA.

Statistical Methods in Medical Research

|October 12, 2023

Summary

This summary is machine-generated.

This study introduces a new Bayesian regression method for variable selection in high-dimensional data. It effectively uses side information to improve the identification of important genetic markers for nicotine dependence.

Keywords:

Biomarker conditional means prior nicotine metabolite ratio side information spike and slab prior

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Related Experiment Videos

Last Updated: Jul 13, 2025

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Area of Science:

Statistics
Genetics
Bioinformatics

Background:

High-dimensional data presents challenges for variable selection.
Existing methods may not fully utilize available side information.
Identifying genetic markers for nicotine dependence is crucial for smoking cessation.

Purpose of the Study:

To develop a novel Bayesian regression framework for variable selection in high-dimensional settings.
To incorporate side information into the sparsity structure of regression coefficients.
To identify genetic markers associated with the nicotine metabolite ratio.

Main Methods:

A Bayesian regression framework using a spike and slab prior.
Incorporation of side information via a binary regression model for inclusion probabilities.
Development of a computationally efficient Markov chain Monte Carlo (MCMC) algorithm.
Data augmentation steps for efficient posterior sampling.

Main Results:

The proposed method effectively leverages side information for variable selection.
Numerical simulations demonstrate strong finite sample performance.
Successful identification of genetic markers linked to the nicotine metabolite ratio.

Conclusions:

The novel Bayesian framework offers an improved approach to variable selection in high-dimensional data.
The method's ability to integrate side information enhances the identification of relevant predictors.
This approach has significant potential for applications in genetic association studies, such as those for nicotine dependence.