Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Test for Homogeneity01:23

Test for Homogeneity

2.6K
The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...
2.6K
Reliability and Validity01:29

Reliability and Validity

14.4K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
14.4K
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

9.5K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
9.5K
Two-Way ANOVA01:17

Two-Way ANOVA

3.7K
The two-way ANOVA is an extension of the one-way ANOVA. It is a statistical test performed on three or more samples categorized by two factors - a row factor and a column factor. Ronald Fischer mentioned it in 1925 in his book 'Statistical Methods for Researchers.'
The two-way ANOVA analysis initially begins by stating the null hypothesis that there is an interaction effect between the two factors of a dataset. This effect can be visualized using line segments formed by joining the...
3.7K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

4.5K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
4.5K
Self-Evaluation Maintenance Model01:29

Self-Evaluation Maintenance Model

401
The Self-Evaluation Maintenance (SEM) model offers a psychological framework to understand how individuals’ self-esteem is influenced by the achievements of others, particularly those with whom they share close personal bonds. The SEM model operates when personal rather than social identity guides individuals. Central to this model is the notion that individuals have an inherent desire to preserve a favorable self-image, which is continuously shaped by interpersonal comparisons and...
401

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Potential Impact of the OPTN Status Escalation Policy for Adult Heart Transplant Candidates With Durable LVADs.

Circulation. Heart failure·2026
Same author

BRIDGE: benchmarking large language models for understanding real-world clinical practice texts.

Nature biomedical engineering·2026
Same author

Synaptonemal complex SUMOylation is maintained by Nup60-dependent docking of Ulp1 at the nuclear periphery.

Cell reports·2026
Same author

Quantifying the Serum Magnesium Response and Predictors of Response Following Intravenous Magnesium Replacement in Critically Ill Patients.

Pharmacotherapy·2026
Same author

A serological survey of hepatitis B among migrant workers at a construction site in Qingdao, China.

Journal of infection in developing countries·2026
Same author

Distinct and overlapping roles of MutLγ, Mus81-Mms4, and STR in meiotic Holliday junction processing.

Nature communications·2026
Same journal

Comparative Evaluation of Pretrained Large Language Models for Suicide Risk Prediction from Clinical Notes in U.S. Veterans.

medRxiv : the preprint server for health sciences·2026
Same journal

Nocturnal Respiratory Rate and Variability Predict Long-term Mortality in Stable Outpatients with Cardiovascular Disease.

medRxiv : the preprint server for health sciences·2026
Same journal

MOSAIC: Methylation-Oriented Site Analysis and Information Classifier for Robust Epigenomic Classification of Acute Leukemia in Clinical Cohorts with Variable Tumor Purity.

medRxiv : the preprint server for health sciences·2026
Same journal

Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

medRxiv : the preprint server for health sciences·2026
Same journal

Development of an automated, imaging-based preoperative screening model for early identification of malnutrition in an abdominal surgery cohort.

medRxiv : the preprint server for health sciences·2026
Same journal

A Pilot Project Leveraging Large Language Models for Automated Screening and Variable Extraction in Observational Studies.

medRxiv : the preprint server for health sciences·2026
See all related articles

Related Experiment Video

Updated: Apr 11, 2026

Development of a Virtual Reality Assessment of Everyday Living Skills
10:32

Development of a Virtual Reality Assessment of Everyday Living Skills

Published on: April 23, 2014

19.5K

Evaluation of SOFA-2 Score Performance Across Demographic Subgroups: An External Validation Study Using MIMIC-IV.

Jacob Ellen1,2, Sicheng Hao2, Catherine A Gao3

  • 1Harvard Medical School, 25 Shattuck Street, Boston, MA 02108, USA.

Medrxiv : the Preprint Server for Health Sciences
|April 10, 2026
PubMed
Summary
This summary is machine-generated.

The Sequential Organ Failure Assessment (SOFA)-2 score accurately predicts ICU mortality overall, but its performance significantly decreases in older patients and non-English speakers, highlighting the need for equity evaluations.

More Related Videos

Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design
07:40

Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design

Published on: May 31, 2021

4.3K
Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K

Related Experiment Videos

Last Updated: Apr 11, 2026

Development of a Virtual Reality Assessment of Everyday Living Skills
10:32

Development of a Virtual Reality Assessment of Everyday Living Skills

Published on: April 23, 2014

19.5K
Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design
07:40

Validation of a Psychosocial Intervention on Body Image in Older People: An Experimental Design

Published on: May 31, 2021

4.3K
Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K

Area of Science:

  • Critical Care Medicine
  • Health Services Research
  • Biostatistics

Background:

  • The Sequential Organ Failure Assessment (SOFA)-2 score is a validated tool for predicting intensive care unit (ICU) mortality.
  • Previous validation did not assess performance across diverse demographic subgroups.
  • Evaluating performance across subgroups is crucial for equitable clinical decision-making.

Purpose of the Study:

  • To assess the discrimination and calibration of the SOFA-2 score for ICU mortality prediction across demographic subgroups.
  • To identify variations in SOFA-2 performance based on age, sex, race/ethnicity, primary language, and insurance status.

Main Methods:

  • Retrospective cohort study using MIMIC-IV database (2008-2022).
  • Included adult patients' first ICU admission (n=64,015).
  • Calculated first-day SOFA-2 scores and assessed discrimination (AUROC) and calibration across subgroups.

Main Results:

  • Overall AUROC for ICU mortality was 0.77.
  • Discrimination significantly declined with age (AUROC 0.85 for 18-44 vs. 0.72 for 75+).
  • Mortality was underpredicted in older patients; discrimination was lower in non-English speakers.

Conclusions:

  • SOFA-2 shows good overall ICU mortality prediction but exhibits significant performance variations across demographic subgroups.
  • A notable decline in discrimination with advancing age and poorer performance in non-English speakers were observed.
  • Routine equity evaluation of clinical prediction tools is essential before widespread implementation to ensure fairness.