Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Accuracy and Errors in Hypothesis Testing

Accuracy and Errors in Hypothesis Testing

Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Margin of Error

Margin of Error

The margin of error is also called the maximum error of an estimate. The margin of error is the maximum possible or expected difference between the observed sample parameter value and the actual population parameter value. For proportion, it is the maximum difference between the value of sample proportion obtained from the data and the true value of population proportion. As the true value of the population parameter is not known, the margin of error is calculated using the sample statistic.

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Uncertainty in Measurement: Accuracy and Precision

Uncertainty in Measurement: Accuracy and Precision

Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value.

Binomial Probability Distribution

Binomial Probability Distribution

A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics.

SN computer science·2022

Same journal

ICUnet++: an Inception-CBAM network based on Unet++ for MR spine image segmentation.

International journal of machine learning and cybernetics·2023

Same journal

A three-way decisions approach based on double hierarchy linguistic aggregation operators of strict t-norms and t-conorms.

International journal of machine learning and cybernetics·2023

Same journal

RNON: image inpainting via repair network and optimization network.

International journal of machine learning and cybernetics·2023

Same journal

Optimal interventional policy based on discrete-time fuzzy rules equivalent model utilizing with COVID-19 pandemic data.

International journal of machine learning and cybernetics·2023

Same journal

SecureFed: federated learning empowered medical imaging technique to analyze lung abnormalities in chest X-rays.

International journal of machine learning and cybernetics·2023

Same journal

A novel framework based on the multi-label classification for dynamic selection of classifiers.

International journal of machine learning and cybernetics·2023

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 25, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

BenchMetrics Prob: benchmarking of probabilistic error/loss performance evaluation instruments for binary

Gürol Canbek¹

¹Pointr, Ankara, Turkey.

International Journal of Machine Learning and Cybernetics

|June 26, 2023

Summary

This summary is machine-generated.

This study evaluates probabilistic performance metrics for binary classification, finding Mean Absolute Error (MAE) most robust for general use and Root Mean Squared Error (RMSE) best when large errors matter most. Avoid less reliable metrics like LogLoss and MAPE.

Keywords:

Binary classification Performance measures Probabilistic error/loss Regression Squared error Time series forecasting

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students

Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students

Published on: June 16, 2018

Related Experiment Videos

Last Updated: Jul 25, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students

Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students

Published on: June 16, 2018

Area of Science:

Machine Learning
Statistical Modeling
Computer Science

Background:

Probabilistic error/loss metrics, common in regression, are increasingly used for binary classification.
Existing methods lack systematic evaluation for their suitability in classification tasks.

Purpose of the Study:

To systematically assess probabilistic instruments for binary classification performance evaluation.
To identify weaknesses of current metrics and determine the most robust options.

Main Methods:

A two-stage benchmarking method, BenchMetrics Prob, was developed.
The method utilized five criteria and fourteen simulation cases with synthetic datasets.
31 instrument/instrument variants were tested.

Main Results:

Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were identified as the most robust metrics.
MAE is recommended for general purposes due to its interpretability and [0, 1] range.
RMSE is preferred when emphasizing larger errors.

Conclusions:

Researchers should carefully select robust probabilistic metrics for binary classification performance evaluation.
Metrics like LogLoss, MAPE, sMAPE, and MRAE demonstrated lower robustness and should be avoided.