Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Two-Way ANOVA

Two-Way ANOVA

The two-way ANOVA is an extension of the one-way ANOVA. It is a statistical test performed on three or more samples categorized by two factors - a row factor and a column factor. Ronald Fischer mentioned it in 1925 in his book 'Statistical Methods for Researchers.'
The two-way ANOVA analysis initially begins by stating the null hypothesis that there is an interaction effect between the two factors of a dataset. This effect can be visualized using line segments formed by joining the...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Self-Evaluation Maintenance Model

Self-Evaluation Maintenance Model

The Self-Evaluation Maintenance (SEM) model offers a psychological framework to understand how individuals’ self-esteem is influenced by the achievements of others, particularly those with whom they share close personal bonds. The SEM model operates when personal rather than social identity guides individuals. Central to this model is the notion that individuals have an inherent desire to preserve a favorable self-image, which is continuously shaped by interpersonal comparisons and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Potential Impact of the OPTN Status Escalation Policy for Adult Heart Transplant Candidates With Durable LVADs.

Circulation. Heart failure·2026

Same author

BRIDGE: benchmarking large language models for understanding real-world clinical practice texts.

Nature biomedical engineering·2026

Same author

Synaptonemal complex SUMOylation is maintained by Nup60-dependent docking of Ulp1 at the nuclear periphery.

Cell reports·2026

Same author

Quantifying the Serum Magnesium Response and Predictors of Response Following Intravenous Magnesium Replacement in Critically Ill Patients.

Pharmacotherapy·2026

Same author

A serological survey of hepatitis B among migrant workers at a construction site in Qingdao, China.

Journal of infection in developing countries·2026

Same author

Distinct and overlapping roles of MutLγ, Mus81-Mms4, and STR in meiotic Holliday junction processing.

Nature communications·2026

Same journal

Comparative Evaluation of Pretrained Large Language Models for Suicide Risk Prediction from Clinical Notes in U.S. Veterans.

medRxiv : the preprint server for health sciences·2026

Same journal

Nocturnal Respiratory Rate and Variability Predict Long-term Mortality in Stable Outpatients with Cardiovascular Disease.

medRxiv : the preprint server for health sciences·2026

Same journal

MOSAIC: Methylation-Oriented Site Analysis and Information Classifier for Robust Epigenomic Classification of Acute Leukemia in Clinical Cohorts with Variable Tumor Purity.

medRxiv : the preprint server for health sciences·2026

Same journal

Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

medRxiv : the preprint server for health sciences·2026

Same journal

Development of an automated, imaging-based preoperative screening model for early identification of malnutrition in an abdominal surgery cohort.

medRxiv : the preprint server for health sciences·2026

Same journal

A Pilot Project Leveraging Large Language Models for Automated Screening and Variable Extraction in Observational Studies.

medRxiv : the preprint server for health sciences·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 11, 2026

Development of a Virtual Reality Assessment of Everyday Living Skills

Development of a Virtual Reality Assessment of Everyday Living Skills

Published on: April 23, 2014

Evaluation of SOFA-2 Score Performance Across Demographic Subgroups: An External Validation Study Using MIMIC-IV.

Jacob Ellen^1,2, Sicheng Hao², Catherine A Gao³

¹Harvard Medical School, 25 Shattuck Street, Boston, MA 02108, USA.

Medrxiv : the Preprint Server for Health Sciences

|April 10, 2026

Summary

This summary is machine-generated.

The Sequential Organ Failure Assessment (SOFA)-2 score accurately predicts ICU mortality overall, but its performance significantly decreases in older patients and non-English speakers, highlighting the need for equity evaluations.

More Related Videos

Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design

Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design

Published on: May 31, 2021

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Related Experiment Videos

Last Updated: Apr 11, 2026

Development of a Virtual Reality Assessment of Everyday Living Skills

Development of a Virtual Reality Assessment of Everyday Living Skills

Published on: April 23, 2014

Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design

Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design

Published on: May 31, 2021

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Area of Science:

Critical Care Medicine
Health Services Research
Biostatistics

Background:

The Sequential Organ Failure Assessment (SOFA)-2 score is a validated tool for predicting intensive care unit (ICU) mortality.
Previous validation did not assess performance across diverse demographic subgroups.
Evaluating performance across subgroups is crucial for equitable clinical decision-making.

Purpose of the Study:

To assess the discrimination and calibration of the SOFA-2 score for ICU mortality prediction across demographic subgroups.
To identify variations in SOFA-2 performance based on age, sex, race/ethnicity, primary language, and insurance status.

Main Methods:

Retrospective cohort study using MIMIC-IV database (2008-2022).
Included adult patients' first ICU admission (n=64,015).
Calculated first-day SOFA-2 scores and assessed discrimination (AUROC) and calibration across subgroups.

Main Results:

Overall AUROC for ICU mortality was 0.77.
Discrimination significantly declined with age (AUROC 0.85 for 18-44 vs. 0.72 for 75+).
Mortality was underpredicted in older patients; discrimination was lower in non-English speakers.

Conclusions:

SOFA-2 shows good overall ICU mortality prediction but exhibits significant performance variations across demographic subgroups.
A notable decline in discrimination with advancing age and poorer performance in non-English speakers were observed.
Routine equity evaluation of clinical prediction tools is essential before widespread implementation to ensure fairness.