Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

496
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
496
Sample Size Calculation01:19

Sample Size Calculation

5.1K
Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...
5.1K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.8K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.8K
Cluster Sampling Method01:20

Cluster Sampling Method

11.0K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
11.0K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

3.2K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
3.2K
Sampling Plans01:23

Sampling Plans

1.4K
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
1.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

An Alport variant illuminates the bioactivity of the collagen IV <sup>α565- α121</sup> scaffold in Bowman's capsule.

bioRxiv : the preprint server for biology·2026
Same author

Combining phenotypic and genomic data to improve prediction of binary traits.

Journal of applied statistics·2024
Same author

Observation of the non-linear Meissner effect.

Nature communications·2022
Same author

LncRNA TRERNA1 promotes malignant progression of NSCLC through targeting FOXL1.

European review for medical and pharmacological sciences·2020
Same author

Impact of somatic molecular profiling on clinical trial outcomes in rare epithelial gynecologic cancer patients.

Gynecologic oncology·2019
Same author

Prognostic Significance of Human Papilloma Virus and p16 Expression in Patients with Vulvar Squamous Cell Carcinoma who Received Radiotherapy.

Clinical oncology (Royal College of Radiologists (Great Britain))·2018
Same journal

Robust nonlinear regression in applications.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2015
Same journal

Influence of GSTT1 Genetic Polymorphisms on Arsenic Metabolism.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2014
Same journal

Inferences on Small Area Proportions.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2014
Same journal

Analysis of Correlated Gene Expression Data on Ordered Categories.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics·2011
See all related articles

Related Experiment Video

Updated: Apr 21, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.0K

Generic Feature Selection with Short Fat Data.

B Clarke1, J-H Chu2

  • 1Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, 68583, USA.

Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics
|October 28, 2014
PubMed
Summary
This summary is machine-generated.

This study introduces a variable selection method for regression with more variables than data points (p >> n). Clustering variables into blocks and regressing on block statistics improves coefficient estimation when data is limited.

Keywords:
BridgeClusteringLASSOLarge p small nRidgeSummary statisticsVariance-bias tradeoff

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.5K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.0K

Related Experiment Videos

Last Updated: Apr 21, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.0K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.5K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.0K

Area of Science:

  • Statistics
  • Machine Learning
  • Bioinformatics

Background:

  • High-dimensional regression (p >> n) poses challenges for accurate inference.
  • Traditional methods struggle with numerous explanatory variables relative to data points.
  • Variable selection is crucial for reliable model estimation in such scenarios.

Purpose of the Study:

  • To develop and evaluate a novel approach for variable selection in high-dimensional regression.
  • To improve the estimation of regression coefficients when the number of predictors exceeds the sample size.
  • To explore the impact of clustering, statistics, and penalty terms on model performance.

Main Methods:

  • Grouping numerous explanatory variables (p) into blocks using clustering algorithms.
  • Evaluating block statistics to represent variable groups.
  • Regressing the response variable on these block statistics using a penalized error criterion.
  • Examining performance across various choices of sample size (n), number of variables (p), statistics, clustering methods, penalty terms, and data types.

Main Results:

  • The proposed block-based regression approach enhances coefficient estimation in high-dimensional settings.
  • Optimal performance is suggested when regressing on approximately n/K statistics, where K is the number of clusters.
  • Deviations from this optimum occur with highly variable block sizes and certain L-norm penalty terms (high q).

Conclusions:

  • Clustering explanatory variables into blocks offers an effective strategy for variable selection in p >> n regression.
  • The number of statistics used for regression should be carefully chosen, guided by the ratio of data points to clusters.
  • The choice of penalty term significantly influences the effectiveness of the method.