Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

An Alport variant illuminates the bioactivity of the collagen IV <sup>α565- α121</sup> scaffold in Bowman's capsule.

bioRxiv : the preprint server for biology·2026

Same author

Combining phenotypic and genomic data to improve prediction of binary traits.

Journal of applied statistics·2024

Same author

Observation of the non-linear Meissner effect.

Nature communications·2022

Same author

LncRNA TRERNA1 promotes malignant progression of NSCLC through targeting FOXL1.

European review for medical and pharmacological sciences·2020

Same author

Impact of somatic molecular profiling on clinical trial outcomes in rare epithelial gynecologic cancer patients.

Gynecologic oncology·2019

Same author

Prognostic Significance of Human Papilloma Virus and p16 Expression in Patients with Vulvar Squamous Cell Carcinoma who Received Radiotherapy.

Clinical oncology (Royal College of Radiologists (Great Britain))·2018

Same journal

Robust nonlinear regression in applications.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2015

Same journal

Influence of GSTT1 Genetic Polymorphisms on Arsenic Metabolism.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2014

Same journal

Inferences on Small Area Proportions.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2014

Same journal

Analysis of Correlated Gene Expression Data on Ordered Categories.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2011

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 21, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Generic Feature Selection with Short Fat Data.

B Clarke¹, J-H Chu²

¹Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, 68583, USA.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics

|October 28, 2014

Summary

This summary is machine-generated.

This study introduces a variable selection method for regression with more variables than data points (p >> n). Clustering variables into blocks and regressing on block statistics improves coefficient estimation when data is limited.

Keywords:

Bridge Clustering LASSO Large p small n Ridge Summary statistics Variance-bias tradeoff

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Apr 21, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Statistics
Machine Learning
Bioinformatics

Background:

High-dimensional regression (p >> n) poses challenges for accurate inference.
Traditional methods struggle with numerous explanatory variables relative to data points.
Variable selection is crucial for reliable model estimation in such scenarios.

Purpose of the Study:

To develop and evaluate a novel approach for variable selection in high-dimensional regression.
To improve the estimation of regression coefficients when the number of predictors exceeds the sample size.
To explore the impact of clustering, statistics, and penalty terms on model performance.

Main Methods:

Grouping numerous explanatory variables (p) into blocks using clustering algorithms.
Evaluating block statistics to represent variable groups.
Regressing the response variable on these block statistics using a penalized error criterion.
Examining performance across various choices of sample size (n), number of variables (p), statistics, clustering methods, penalty terms, and data types.

Main Results:

The proposed block-based regression approach enhances coefficient estimation in high-dimensional settings.
Optimal performance is suggested when regressing on approximately n/K statistics, where K is the number of clusters.
Deviations from this optimum occur with highly variable block sizes and certain L-norm penalty terms (high q).

Conclusions:

Clustering explanatory variables into blocks offers an effective strategy for variable selection in p >> n regression.
The number of statistics used for regression should be carefully chosen, guided by the ratio of data points to clusters.
The choice of penalty term significantly influences the effectiveness of the method.