Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Biostatistics: Overview

Biostatistics: Overview

Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

Same author

Pan-Cancer Drug Response Prediction Using Integrative Principal Component Regression.

Statistics in biosciences·2026

Same author

COVID-19 vaccination campaign, knowledge, and trust in Duran, Ecuador: a cross-sectional study.

Vaccine·2026

Same author

Splicing of HPV16 E6 promotes aggressive invasion in oropharyngeal cancer via endocytosis of E-cadherin.

bioRxiv : the preprint server for biology·2025

Same author

Corrigendum to "Are we there yet? Gut microbiota for cancer diagnosis, prognosis and treatment" [Seminars in Oncology Volume 52, Issue 4, 2025, 152376].

Seminars in oncology·2025

Same author

Thoracic trauma WSES-AAST guidelines.

World journal of emergency surgery : WJES·2025

Same journal

ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets.

ACM transactions on knowledge discovery from data·2025

Same journal

ArieL: Adversarial Graph Contrastive Learning.

ACM transactions on knowledge discovery from data·2025

Same journal

Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping.

ACM transactions on knowledge discovery from data·2019

Same journal

Cross-Dependency Inference in Multi-Layered Networks: A Collaborative Filtering Perspective.

ACM transactions on knowledge discovery from data·2017

Same journal

CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering.

ACM transactions on knowledge discovery from data·2017

Same journal

Scalable and Axiomatic Ranking of Network Role Similarity.

ACM transactions on knowledge discovery from data·2014

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 4, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Bayesian Variable Selection in Linear Regression in One Pass for Large Data Sets.

Carlos Ordonez¹, Carlos Garcia-Alvarado¹, Veerabhadran Baladandayuthapani²

¹University of Houston.

ACM Transactions on Knowledge Discovery From Data

|April 4, 2023

Summary

This summary is machine-generated.

This study introduces a faster Bayesian approach for variable selection in linear regression using an optimized Gibbs sampler. The new method significantly speeds up computation, making Bayesian variable selection more efficient for large datasets.

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Related Experiment Videos

Last Updated: Aug 4, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Area of Science:

Computational Statistics
Machine Learning
Database Systems

Background:

Bayesian models often rely on Markov Chain Monte Carlo (MCMC) methods for computation.
MCMC methods require numerous iterations, posing challenges for large datasets.
Variable selection in linear regression is computationally intensive due to its combinatorial nature.

Purpose of the Study:

To accelerate Bayesian model computation for variable selection in linear regression.
To develop an efficient algorithm that overcomes the limitations of traditional MCMC methods.
To integrate Bayesian variable selection into database management systems.

Main Methods:

Developed a fast Gibbs sampler algorithm with optimizations for Bayesian variable selection.
Utilized non-informative and conjugate prior distributions for efficient data summarization.
Employed sparse binary vectors for efficient matrix projections and hash tables for variable subset probabilities.
Integrated the algorithm into a database management system (DBMS) using User-Defined Functions and stored procedures.

Main Results:

The proposed algorithm achieves accurate results comparable to existing methods.
Demonstrated linear scalability with respect to dataset size.
Achieved orders-of-magnitude speedup compared to the R package for Bayesian variable selection.
Showcased efficient parallel data summarization and matrix manipulation within a DBMS.

Conclusions:

The optimized Gibbs sampler significantly accelerates Bayesian model computation for variable selection.
Integrating the algorithm into a DBMS enhances performance and scalability.
This approach offers a practical and efficient solution for variable selection in large-scale Bayesian analyses.