Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Binomial Probability Distribution

Binomial Probability Distribution

A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Generative AI-enabled clinical decision support system in primary care: a pragmatic, cluster-randomized trial.

Nature medicine·2026

Same author

One-year and 5-year transition risks between major bleeding, reinfarction and death following Acute Myocardial Infarction in England and Wales: A population-based cohort study.

European heart journal. Acute cardiovascular care·2026

Same author

Body mass index, adjuvant chemotherapy, toxicity, and survival in non-metastatic colorectal cancer: an individual participant data meta-analysis (OCTOPUS).

British journal of cancer·2026

Same author

Critical appraisal of fairness metrics for artificial intelligence-based clinical prediction models: a scoping review.

The Lancet. Digital health·2026

Same author

The Lancet Commission on precision health: equitable, data-driven health outcomes for all.

Lancet (London, England)·2026

Same author

Agreement between heuristic shrinkage factor and optimal shrinkage factors in logistic regression for risk prediction: a simulation study across different sample sizes and settings.

Diagnostic and prognostic research·2026

Same journal

A joint model for a longitudinal outcome and a progressive multistate model under a mixed observation scheme.

Statistical methods in medical research·2026

Same journal

Efficient semi-supervised estimation of optimal individualized treatment regimes with survival outcome.

Statistical methods in medical research·2026

Same journal

Asymptotic online FWER control for dependent test statistics.

Statistical methods in medical research·2026

Same journal

Regression analysis of misclassified current status data with potentially unknown test accuracy.

Statistical methods in medical research·2026

Same journal

Bayesian multivariate linear mixed-effects models with varied association structures.

Statistical methods in medical research·2026

Same journal

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 13, 2025

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Minimum sample size for developing a multivariable prediction model using multinomial logistic regression.

Alexander Pate¹, Richard D Riley², Gary S Collins^3,4

¹Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, 5292University of Manchester, Manchester Academic Health Science Centre, Manchester, UK.

Statistical Methods in Medical Research

|January 20, 2023

Summary

This summary is machine-generated.

Researchers can now determine appropriate sample sizes for multinomial logistic regression models using three new criteria. These criteria help minimize overfitting and ensure precise risk estimation for outcomes with multiple categories.

Keywords:

Clinical prediction models multinomial logistic regression sample size shrinkage

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Related Experiment Videos

Last Updated: Aug 13, 2025

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Area of Science:

Statistics
Biostatistics
Epidemiology

Background:

Multinomial logistic regression models predict outcomes with more than two categories.
Determining adequate sample size (n) is crucial for model validity, considering events (Ek) and predictors (pk).
Existing sample size criteria are primarily for binary outcomes.

Purpose of the Study:

Propose three criteria for minimum sample size in multinomial logistic regression.
Address sample size determination for models with >2 outcome categories.
Extend existing criteria for binary outcomes to multinomial settings.

Main Methods:

Developed three criteria for sample size calculation.
Criterion (i) focuses on minimizing model overfitting using Cox-Snell R² from sub-models.
Criteria (ii) and (iii) extend existing methods for binary outcomes.

Main Results:

Criterion (i) was validated via simulation, confirming its ability to control overfitting.
Simulation showed sample size must be based on Cox-Snell R² of distinct 'one-to-one' models.
Criteria (ii) and (iii) are direct extensions and did not require simulation.

Conclusions:

The proposed criteria provide a framework for sample size determination in multinomial logistic regression.
Implementation is demonstrated with a worked example for ovarian tumor type prediction.
The criteria will be integrated into the pmsampsize R library and Stata modules.