Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Variation01:19

Variation

6.7K
An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...
6.7K
Multiple Regression01:25

Multiple Regression

2.9K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
2.9K
Survival Tree01:19

Survival Tree

60
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
60
Biostatistics: Overview01:20

Biostatistics: Overview

220
Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...
220
Variability: Analysis01:11

Variability: Analysis

126
Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...
126
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Measuring performance trajectories in lung cancer surgery: a longitudinal study using the French national hospital database from 2020 to 2024.

BMJ health & care informatics·2026
Same author

Revisiting the Association Between Calcium Channel Blockers and Parkinson's Disease in the E3N Cohort.

Movement disorders : official journal of the Movement Disorder Society·2026
Same author

Factors associated with non-participation of preterm infants at two and five years in a large population-based French healthcare network.

BMC pediatrics·2026
Same author

Association between exposure to non-steroidal anti-inflammatory drugs in pregnancy and miscarriage risk: a French nationwide retrospective cohort study.

BMJ open·2026
Same author

Background incidence rates and observed-to-expected ratios of adverse events of special interest after covid-19 mRNA vaccination during pregnancy in France: a nationwide population-based study.

The Lancet regional health. Europe·2026
Same author

Identifying Drugs Associated With Parkinson's Disease Risk Using Machine Learning.

Basic & clinical pharmacology & toxicology·2026
Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026
Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026
Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026
Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026
Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026
Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026
See all related articles

Related Experiment Video

Updated: Jun 5, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

A Simple Information Criterion for Variable Selection in High-Dimensional Regression.

Matthieu Pluntz1, Cyril Dalmasso2, Pascale Tubert-Bitter1

  • 1High-Dimensional Biostatistics for Drug Safety and Genomics, CESP, Université Paris-Saclay, UVSQ, Université Paris-Sud, Inserm, Villejuif, France.

Statistics in Medicine
|December 12, 2024
PubMed
Summary
This summary is machine-generated.

We introduce the extended AIC (EAIC), a novel criterion for sparse model selection in high-dimensional regression. EAIC controls false positive rates, unlike AIC and BIC, improving variable selection accuracy.

Keywords:
FWER controlLASSOhigh‐dimensional regressioninformation criterionpharmacovigilancevariable selection

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K
Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients
07:34

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

8.2K

Related Experiment Videos

Last Updated: Jun 5, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K
Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients
07:34

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

8.2K

Area of Science:

  • Statistics
  • Bioinformatics
  • Pharmacovigilance

Background:

  • High-dimensional regression, common in genomics and drug studies, requires selecting sparse regressors.
  • Existing criteria like AIC and BIC are too liberal due to unaddressed multiple testing in variable selection.

Purpose of the Study:

  • To propose a new information criterion, the extended AIC (EAIC), for robust sparse model selection in high-dimensional regressions.
  • To ensure asymptotic Family-Wise Error Rate (FWER) control in variable selection.

Main Methods:

  • Developed the extended AIC (EAIC) formula incorporating log-likelihood, model size, total candidate regressors, and FWER target.
  • Evaluated EAIC against AIC, BIC, mBIC, mAIC, and EBIC using LASSO in simulations across linear and logistic regression settings.
  • Applied EAIC to detect adverse drug reaction signals in pharmacovigilance data.

Main Results:

  • EAIC demonstrated effective FWER control across diverse regression settings, unlike AIC and BIC which showed numerous false positives.
  • Simulation studies confirmed EAIC's superior variable selection performance compared to other criteria.
  • The method proved effective in identifying potential adverse drug reactions from real-world pharmacovigilance data.

Conclusions:

  • The extended AIC (EAIC) provides a statistically sound method for sparse model selection in high-dimensional regression.
  • EAIC offers improved accuracy and control over false positives, crucial for applications like genomic analysis and pharmacovigilance.
  • EAIC is a valuable tool for automated signal detection and reliable variable selection in complex datasets.