Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Receiver Operating Characteristic Plot

Receiver Operating Characteristic Plot

A ROC (Receiver Operating Characteristic) plot is a graphical tool used to assess the performance of a binary classification model by illustrating the trade-off between sensitivity (true positive rate) and specificity (false positive rate). By plotting sensitivity against 1 - specificity across various threshold settings, the ROC curve shows how well the model distinguishes between classes, with a curve closer to the top-left corner indicating a more accurate model. The area under the ROC curve...

Sensitivity, Specificity, and Predicted Value

Sensitivity, Specificity, and Predicted Value

In healthcare diagnostics, laboratory tests play a crucial role in identifying and diagnosing a wide range of medical conditions. However, interpreting test results is not always straightforward. An abnormal test result does not always confirm the presence of a disease, just as a normal result does not guarantee its absence. To assess the reliability of these diagnostic tools, healthcare practitioners rely on two key statistical indicators: sensitivity and specificity.
Sensitivity is the...

Accuracy and Precision

Accuracy and Precision

Accuracy and Precision

Accuracy and Precision

Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value. Highly accurate...

Accuracy and Errors in Hypothesis Testing

Accuracy and Errors in Hypothesis Testing

Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Prediction of Anthracycline-induced Cardiotoxicity Using Cardiac MRI Parameters: An Animal Study.

Radiology. Cardiothoracic imaging·2026

Same author

An AI-driven, wearable, conformal ring system for real-time and user-independent sign language interpretation.

Science advances·2026

Same author

Deep Learning for Survival Prediction in Glioblastoma: Time-dependent Model Interpretability Using MRI, Clinical, and Molecular Data.

Radiology. Artificial intelligence·2026

Same author

Feasibility of Using an AI System for Breast Ultrasonography Interpretation According to Clinical Expertise: Results of a Pilot Study.

Journal of the Korean Society of Radiology·2026

Same author

Uncover This Tech Term: Large Vision-Language Models in Radiology.

Korean journal of radiology·2026

Same author

Phase IB/II Trial with Correlative Analyses of Doxorubicin plus Durvalumab Combination in Patients with Advanced Soft Tissue Sarcoma.

Clinical cancer research : an official journal of the American Association for Cancer Research·2026

Same journal

Comments on "Prognostic Significance of Pretreatment ¹⁸F-FDG PET/CT Parameters in Patients With ER+/HER2- Metastatic Breast Cancer Treated With CDK4/6 Inhibitors Plus Endocrine Therapy".

Korean journal of radiology·2026

Same journal

Automated Breast Ultrasound in Dense-Breast Screening: Beyond Additional Cancer Detection.

Korean journal of radiology·2026

Same journal

Standardizing Obesity Imaging: From Confirmation of Excess Adiposity to Integrated Body Composition Phenotyping.

Korean journal of radiology·2026

Same journal

Response to "Automated Breast Ultrasound in Dense-Breast Screening: Beyond Additional Cancer Detection".

Korean journal of radiology·2026

Same journal

Cerebrospinal Fluid Shunts: An Updated Radiologic Review of Devices, Malfunctions, and Complications.

Korean journal of radiology·2026

Same journal

Response to "Standardizing Obesity Imaging: From Confirmation of Excess Adiposity to Integrated Body Composition Phenotyping".

Korean journal of radiology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 1, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Key Measures for Evaluating Diagnostic Accuracy in Multi-Class Classification: An Overview and Simulation-Based

Leeha Ryu¹, Kyunghwa Han^2,3, Inkyung Jung⁴

¹Department of Biostatistics and Computing, Yonsei University Graduate School, Seoul, Republic of Korea.

Korean Journal of Radiology

|March 31, 2026

Summary

This summary is machine-generated.

Evaluating multi-class classification metrics in AI reveals that while most perform well with balanced data, the M-index and polytomous discrimination index show greater stability with imbalanced datasets, crucial for medical predictive modeling.

Keywords:

Accuracy Index Measure Metrics Multiclass classification Performance Polytomous outcome prediction

More Related Videos

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Related Experiment Videos

Last Updated: Apr 1, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Area of Science:

Artificial Intelligence
Medical Informatics
Statistical Modeling

Background:

AI advancements drive predictive modeling in medicine.
Need for robust multi-class classification metrics due to system complexity.
Limited comparative studies on multi-class metrics under varied data conditions.

Purpose of the Study:

To provide an overview of common multi-class classification accuracy metrics.
To systematically evaluate diagnostic accuracy measures via simulation.
To offer practical guidance for metric selection in multi-class tasks.

Main Methods:

Overview of established multi-class classification metrics.
Simulation study across diverse scenarios (3- and 5-class, balanced/imbalanced data, varying predictor distributions).
Assessment of bias and 95% confidence interval coverage for each metric.

Main Results:

Most metrics showed stable, unbiased performance under balanced conditions.
Imbalanced conditions revealed greater bias; M-index and polytomous discrimination index performed more stably.
Micro-averaged ROC curve area consistently exhibited higher bias with class imbalance.

Conclusions:

Metric performance varies significantly with data balance.
M-index and polytomous discrimination index are recommended for imbalanced multi-class medical data.
Systematic evaluation aids informed metric selection in AI-driven medical diagnostics.