Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Column Efficiency: Rate Theory

Column Efficiency: Rate Theory

The rate theory of chromatography provides quantitative insight into the shapes and widths of elution bands. These bands are based on the random-walk mechanism governing molecular migration within a column. The Gaussian profile of chromatographic bands arises from the cumulative effect of random molecular motions as they progress through the column.
During elution, a solute molecule experiences numerous transitions between stationary and mobile phases, exhibiting irregular residence times in...

Column Efficiency: Plate Theory

Column Efficiency: Plate Theory

Band broadening in a chromatography column is measured by its efficiency. This is determined by the number of theoretical plates (N). Theoretical plate theory states that a separation column consists of a continuous series of imaginary plates where solute equilibration occurs between stationary and mobile phases.
A higher number of theoretical plates signifies better column efficiency and improved separation capabilities. Plate height affects bandwidth and separation quality; it is inversely...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Contingency Table

Contingency Table

A contingency table provides a way of portraying data that can facilitate calculating probabilities. It is a method of displaying a frequency distribution as a table with rows and columns to show how two variables may be dependent (contingent) upon each other; The table helps determine conditional probabilities quite quickly and can help systematically organize, analyze and quantify data. The table displays sample values concerning two variables that may be dependent or contingent on one...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Wavelet Decomposition-Based Genomic Analysis of the Human Electrocardiogram.

medRxiv : the preprint server for health sciences·2026

Same author

Quantifying Anterior Cruciate Ligament Injury Resilience: A Screening and Composite Score Framework.

Orthopaedic journal of sports medicine·2026

Same author

Estimating heterogeneous treatment effects for general responses.

Biometrics·2025

Same author

Using pre-training and interaction modeling for ancestry-specific disease prediction using multiomics data from the UK Biobank.

PloS one·2025

Same author

Annotation-free discovery of disease-relevant cells in single-cell datasets.

Science advances·2025

Same author

STATISTICAL CURVE MODELS FOR INFERRING 3D CHROMATIN ARCHITECTURE.

The annals of applied statistics·2025

Same journal

Simplifying debiased inference via automatic differentiation and probabilistic programming.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Principal stratification with U-statistics under principal ignorability.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Causal K-Means Clustering.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Correction to: Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Harmonized Estimation of Subgroup-Specific Treatment Effects in Randomized Trials: The Use of External Control Data.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 13, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A statistical view of column subset selection.

Anav Sood¹, Trevor Hastie¹

¹Department of Statistics, Stanford University, Sequoia Hall, 390 Jane Stanford Way, Stanford, CA 94305, USA.

Journal of the Royal Statistical Society. Series B, Statistical Methodology

|July 28, 2025

Summary

This summary is machine-generated.

This study unifies column subset selection (CSS) and principal variable identification, demonstrating their equivalence through maximum-likelihood estimation. It establishes conditions for consistent CSS in high dimensions and offers efficient methods for its application.

Keywords:

column subset selection high-dimensional statistics interpretable dimensionality reduction principal components analysis principal variables probabilistic modelling

More Related Videos

Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization

Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization

Published on: May 18, 2020

Design and Optimization Strategies of a High-Performance Vented Box

Design and Optimization Strategies of a High-Performance Vented Box

Published on: June 9, 2023

Related Experiment Videos

Last Updated: Sep 13, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization

Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization

Published on: May 18, 2020

Design and Optimization Strategies of a High-Performance Vented Box

Design and Optimization Strategies of a High-Performance Vented Box

Published on: June 9, 2023

Area of Science:

Statistics
Computer Science
Data Analysis

Background:

Dimensionality reduction is crucial for large datasets.
Column Subset Selection (CSS) and principal variable identification are common approaches.
These methods have traditionally been viewed separately.

Purpose of the Study:

To demonstrate the equivalence between CSS and principal variable identification.
To formalize both approaches within a unified semi-parametric maximum-likelihood model.
To develop efficient and robust methods for variable selection.

Main Methods:

Maximum-likelihood estimation within a semi-parametric model.
Analysis of consistency in high-dimensional data under the proportional asymptotic regime.
Development of methods utilizing summary statistics and handling missing/censored data.

Main Results:

Column Subset Selection (CSS) and principal variable identification are shown to be equivalent.
Conditions for consistent CSS in high dimensions are established.
Efficient algorithms for CSS are proposed, including those for incomplete datasets.

Conclusions:

A unified theoretical framework connects computer science and statistical approaches to variable selection.
The proposed methods offer efficient and consistent solutions for dimensionality reduction.
The findings facilitate practical application of variable selection in diverse data scenarios.