Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

15.6K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
15.6K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.6K
Multiple Regression01:25

Multiple Regression

4.3K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
4.3K
Statistical Hypothesis Testing01:16

Statistical Hypothesis Testing

7.1K
Hypothesis testing is a critical statistical procedure facilitating informed, evidence-based decisions. It begins with a hypothesis, which is a tentative explanation, or a prediction about a population parameter. This hypothesis can be either a null hypothesis (H0), indicating no effect or difference, or an alternative hypothesis (Ha), suggesting an effect or difference.
Statistical significance measures the probability that an observed result occurred by chance. If this probability, known as...
7.1K
Regression Analysis01:11

Regression Analysis

8.9K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
8.9K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.8K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Cell Cycle Sensing Shapes Human T Cell Fate and Exhaustion Programs.

bioRxiv : the preprint server for biology·2026
Same author

Wavelet Decomposition-Based Genomic Analysis of the Human Electrocardiogram.

medRxiv : the preprint server for health sciences·2026
Same author

Structure-preserving multivariate hypothesis testing for mass spectrometry imaging and single-cell data.

Bioinformatics (Oxford, England)·2026
Same author

Temporal and spatial composition of the tumor microenvironment predicts response to immune checkpoint inhibition in metastatic TNBC.

Nature cancer·2026
Same author

Prognostic pan-cancer and single-cancer models: A large-scale analysis using a real-world clinico-genomic database.

PloS one·2026
Same author

Glaucoma Classification Through SSVEP-Derived ON- and OFF-Pathway Features.

Translational vision science & technology·2026
Same journal

A Bayesian functional concurrent zero-inflated Dirichlet-multinomial regression model with application to infant microbiome.

Biostatistics (Oxford, England)·2026
Same journal

Towards optimal environmental policies: policy learning under arbitrary bipartite network interference.

Biostatistics (Oxford, England)·2026
Same journal

Multilevel functional quantile principal component analysis.

Biostatistics (Oxford, England)·2026
Same journal

Adaptive transfer learning for time-to-event modeling with applications in disease risk assessment.

Biostatistics (Oxford, England)·2026
Same journal

High-dimensional test for one-sided hypotheses.

Biostatistics (Oxford, England)·2026
Same journal

NBSR: a Negative Binomial Softmax Regression model for microRNA-seq data analysis.

Biostatistics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Mar 29, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.4K

Sparse regression and marginal testing using cluster prototypes.

Stephen Reid1, Robert Tibshirani2

  • 1Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA 94305, USA sreid@stanford.edu.

Biostatistics (Oxford, England)
|November 29, 2015
PubMed
Summary
This summary is machine-generated.

This study introduces a novel sparse regression and feature selection method for correlated data. It clusters features, selects prototypes, and uses advanced statistical inference for accurate p-values and false discovery rate control.

Keywords:
ClusteringCorrelated predictorsKnockoffLassoMarginal screeningPost-selection inference

More Related Videos

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups
14:14

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

6.5K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

20.6K

Related Experiment Videos

Last Updated: Mar 29, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.4K
The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups
14:14

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

6.5K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

20.6K

Area of Science:

  • Statistics
  • Machine Learning
  • Bioinformatics

Background:

  • Correlated features pose challenges in sparse regression and significance testing.
  • Existing methods may not accurately account for feature selection processes.

Purpose of the Study:

  • To develop a robust method for sparse regression and marginal testing with correlated features.
  • To provide accurate statistical inference, including p-values and confidence intervals, that accounts for feature selection.

Main Methods:

  • Feature clustering to identify informative prototypes.
  • Application of sparse regression (LASSO) and marginal significance testing on prototypes.
  • Utilizing post-selection inference theory for exact p-values and confidence intervals.
  • Employing the "knockoff" method for false discovery rate (FDR) control.

Main Results:

  • The proposed method effectively handles correlated features in regression and testing.
  • Exact p-values and confidence intervals are computed, correctly adjusting for prototype selection.
  • The "knockoff" approach ensures finite sample FDR control for the regression procedure.

Conclusions:

  • The novel approach offers improved accuracy and reliability for sparse regression and marginal testing.
  • This method provides a statistically sound framework for analyzing complex, correlated datasets.
  • Demonstrated effectiveness on both simulated and real-world data.