Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Statistical Hypothesis Testing

Statistical Hypothesis Testing

Hypothesis testing is a critical statistical procedure facilitating informed, evidence-based decisions. It begins with a hypothesis, which is a tentative explanation, or a prediction about a population parameter. This hypothesis can be either a null hypothesis (H0), indicating no effect or difference, or an alternative hypothesis (Ha), suggesting an effect or difference.
Statistical significance measures the probability that an observed result occurred by chance. If this probability, known as...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Cell Cycle Sensing Shapes Human T Cell Fate and Exhaustion Programs.

bioRxiv : the preprint server for biology·2026

Same author

Wavelet Decomposition-Based Genomic Analysis of the Human Electrocardiogram.

medRxiv : the preprint server for health sciences·2026

Same author

Structure-preserving multivariate hypothesis testing for mass spectrometry imaging and single-cell data.

Bioinformatics (Oxford, England)·2026

Same author

Temporal and spatial composition of the tumor microenvironment predicts response to immune checkpoint inhibition in metastatic TNBC.

Nature cancer·2026

Same author

Prognostic pan-cancer and single-cancer models: A large-scale analysis using a real-world clinico-genomic database.

PloS one·2026

Same author

Glaucoma Classification Through SSVEP-Derived ON- and OFF-Pathway Features.

Translational vision science & technology·2026

Same journal

A Bayesian functional concurrent zero-inflated Dirichlet-multinomial regression model with application to infant microbiome.

Biostatistics (Oxford, England)·2026

Same journal

Towards optimal environmental policies: policy learning under arbitrary bipartite network interference.

Biostatistics (Oxford, England)·2026

Same journal

Multilevel functional quantile principal component analysis.

Biostatistics (Oxford, England)·2026

Same journal

Adaptive transfer learning for time-to-event modeling with applications in disease risk assessment.

Biostatistics (Oxford, England)·2026

Same journal

High-dimensional test for one-sided hypotheses.

Biostatistics (Oxford, England)·2026

Same journal

NBSR: a Negative Binomial Softmax Regression model for microRNA-seq data analysis.

Biostatistics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 29, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Sparse regression and marginal testing using cluster prototypes.

Stephen Reid¹, Robert Tibshirani²

¹Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA 94305, USA sreid@stanford.edu.

Biostatistics (Oxford, England)

|November 29, 2015

Summary

This summary is machine-generated.

This study introduces a novel sparse regression and feature selection method for correlated data. It clusters features, selects prototypes, and uses advanced statistical inference for accurate p-values and false discovery rate control.

Keywords:

Clustering Correlated predictors Knockoff Lasso Marginal screening Post-selection inference

More Related Videos

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Related Experiment Videos

Last Updated: Mar 29, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Area of Science:

Statistics
Machine Learning
Bioinformatics

Background:

Correlated features pose challenges in sparse regression and significance testing.
Existing methods may not accurately account for feature selection processes.

Purpose of the Study:

To develop a robust method for sparse regression and marginal testing with correlated features.
To provide accurate statistical inference, including p-values and confidence intervals, that accounts for feature selection.

Main Methods:

Feature clustering to identify informative prototypes.
Application of sparse regression (LASSO) and marginal significance testing on prototypes.
Utilizing post-selection inference theory for exact p-values and confidence intervals.
Employing the "knockoff" method for false discovery rate (FDR) control.

Main Results:

The proposed method effectively handles correlated features in regression and testing.
Exact p-values and confidence intervals are computed, correctly adjusting for prototype selection.
The "knockoff" approach ensures finite sample FDR control for the regression procedure.

Conclusions:

The novel approach offers improved accuracy and reliability for sparse regression and marginal testing.
This method provides a statistically sound framework for analyzing complex, correlated datasets.
Demonstrated effectiveness on both simulated and real-world data.