Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

12.9K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
12.9K
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

302
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
302
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.2K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.2K
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

4.1K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
4.1K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.7K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.7K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.9K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Development of a signal quality evaluation of dynamic versus static <sup>18</sup>FDG-PET in focal epilepsy via Bayesian regional estimated signal quality analysis.

AJNR. American journal of neuroradiology·2026
Same author

From urge to behavior: An investigation of the temporal relationship between eating disorder urges and engagement in eating disorder behaviors.

Behaviour research and therapy·2026
Same author

Handling Missing Data in Longitudinal Rehabilitation Research: A Methodological Demonstration With Functional Trajectories of Older Adults With TBI.

The Journal of head trauma rehabilitation·2026
Same author

Mesocorticolimbic connectivity and motivational sensitivity: sex-specific effects of puberty in early adolescence.

Neuropsychopharmacology : official publication of the American College of Neuropsychopharmacology·2026
Same author

Ordinal Outcome State-Space Models for Intensive Longitudinal Data.

Psychometrika·2026
Same author

Does lumbar vertebra bone microstructure relate to combined loading fracture tolerance and inform fracture initiation site?

Bone·2026
Same journal

A joint model for a longitudinal outcome and a progressive multistate model under a mixed observation scheme.

Statistical methods in medical research·2026
Same journal

Efficient semi-supervised estimation of optimal individualized treatment regimes with survival outcome.

Statistical methods in medical research·2026
Same journal

Asymptotic online FWER control for dependent test statistics.

Statistical methods in medical research·2026
Same journal

Regression analysis of misclassified current status data with potentially unknown test accuracy.

Statistical methods in medical research·2026
Same journal

Bayesian multivariate linear mixed-effects models with varied association structures.

Statistical methods in medical research·2026
Same journal

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical research·2026
See all related articles

Related Experiment Video

Updated: Sep 19, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K

Fast leave-one-cluster-out cross-validation using clustered network information criterion.

Jiaxing Qiu1,2, Douglas E Lake2, Pavel Chernyavskiy2

  • 1School of Data Science, School of Medicine, University of Virginia, Charlottesville, VA, USA.

Statistical Methods in Medical Research
|June 19, 2025
PubMed
Summary
This summary is machine-generated.

A new clustered estimator of the network information criterion (CNIC) accurately assesses prediction model generalizability for clustered data. CNIC is a faster, more reliable alternative to cluster-based cross-validation, especially with strong clustering.

Keywords:
Fisher information matrixPredictive modelingcluster-based cross-validationclustered datanetwork information criterion

More Related Videos

Modeling the Functional Network for Spatial Navigation in the Human Brain
05:55

Modeling the Functional Network for Spatial Navigation in the Human Brain

Published on: October 13, 2023

1.2K
JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics
07:28

JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics

Published on: October 19, 2021

3.3K

Related Experiment Videos

Last Updated: Sep 19, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K
Modeling the Functional Network for Spatial Navigation in the Human Brain
05:55

Modeling the Functional Network for Spatial Navigation in the Human Brain

Published on: October 13, 2023

1.2K
JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics
07:28

JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics

Published on: October 19, 2021

3.3K

Area of Science:

  • Statistics
  • Machine Learning
  • Biostatistics

Background:

  • Prediction models on clustered data require cluster-based validation for generalizability.
  • Existing methods like Akaike information criterion (AIC) and Bayesian information criterion (BIC) may not adequately address cluster heterogeneity.
  • Leave-one-cluster-out cross-validation is a robust but computationally intensive validation technique.

Purpose of the Study:

  • Introduce a clustered estimator of the network information criterion (CNIC) as a fast approximation to leave-one-cluster-out deviance.
  • Develop a method to assess model generalizability for prediction models with clustered data.
  • Provide a more accurate model selection criterion for clustered data compared to AIC and BIC.

Main Methods:

  • Derived a clustered network information criterion by modifying the standard network information criterion with a clustering-adjusted Fisher information matrix.
  • Applied the CNIC to standard regression models with Gaussian or binomial responses for clustered data.
  • Evaluated CNIC performance using simulation studies and an empirical example, comparing it to cluster-based cross-validation, AIC, and BIC.

Main Results:

  • The clustered network information criterion (CNIC) provides a more accurate approximation to leave-one-cluster-out deviance than AIC and BIC.
  • CNIC results in more accurate model size and variable selection, particularly when data exhibit strong clustering.
  • CNIC imposes a greater penalty for stronger clustering, effectively preventing over-parameterization.

Conclusions:

  • CNIC is a computationally efficient and accurate tool for model selection and validation in prediction models with clustered data.
  • CNIC offers superior performance over traditional criteria like AIC and BIC when dealing with cluster heterogeneity.
  • The proposed method enhances the reliability of prediction models developed on clustered datasets.