Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Accuracy and Errors in Hypothesis Testing01:13

Accuracy and Errors in Hypothesis Testing

229
Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...
229
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.6K
Margin of Error01:27

Margin of Error

4.4K
The margin of error is also called the maximum error of an estimate. The margin of error is the maximum possible or expected difference between the observed sample parameter value and the actual population parameter value. For proportion, it is the maximum difference between the value of sample proportion obtained from the data and the true value of population proportion. As the true value of the population parameter is not known, the margin of error is calculated using the sample statistic.
4.4K
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

6.7K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
6.7K
Uncertainty in Measurement: Accuracy and Precision03:37

Uncertainty in Measurement: Accuracy and Precision

74.0K
Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value. 
74.0K
Binomial Probability Distribution01:15

Binomial Probability Distribution

11.2K
A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...
11.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics.

SN computer science·2022
Same journal

ICUnet++: an Inception-CBAM network based on Unet++ for MR spine image segmentation.

International journal of machine learning and cybernetics·2023
Same journal

A three-way decisions approach based on double hierarchy linguistic aggregation operators of strict t-norms and t-conorms.

International journal of machine learning and cybernetics·2023
Same journal

RNON: image inpainting via repair network and optimization network.

International journal of machine learning and cybernetics·2023
Same journal

Optimal interventional policy based on discrete-time fuzzy rules equivalent model utilizing with COVID-19 pandemic data.

International journal of machine learning and cybernetics·2023
Same journal

SecureFed: federated learning empowered medical imaging technique to analyze lung abnormalities in chest X-rays.

International journal of machine learning and cybernetics·2023
Same journal

A novel framework based on the multi-label classification for dynamic selection of classifiers.

International journal of machine learning and cybernetics·2023
See all related articles

Related Experiment Video

Updated: Jul 25, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

BenchMetrics Prob: benchmarking of probabilistic error/loss performance evaluation instruments for binary

Gürol Canbek1

  • 1Pointr, Ankara, Turkey.

International Journal of Machine Learning and Cybernetics
|June 26, 2023
PubMed
Summary
This summary is machine-generated.

This study evaluates probabilistic performance metrics for binary classification, finding Mean Absolute Error (MAE) most robust for general use and Root Mean Squared Error (RMSE) best when large errors matter most. Avoid less reliable metrics like LogLoss and MAPE.

Keywords:
Binary classificationPerformance measuresProbabilistic error/lossRegressionSquared errorTime series forecasting

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.1K
Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students
12:51

Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students

Published on: June 16, 2018

7.5K

Related Experiment Videos

Last Updated: Jul 25, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.1K
Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students
12:51

Evaluation of Commercial-Off-The-Shelf Wrist Wearables to Estimate Stress on Students

Published on: June 16, 2018

7.5K

Area of Science:

  • Machine Learning
  • Statistical Modeling
  • Computer Science

Background:

  • Probabilistic error/loss metrics, common in regression, are increasingly used for binary classification.
  • Existing methods lack systematic evaluation for their suitability in classification tasks.

Purpose of the Study:

  • To systematically assess probabilistic instruments for binary classification performance evaluation.
  • To identify weaknesses of current metrics and determine the most robust options.

Main Methods:

  • A two-stage benchmarking method, BenchMetrics Prob, was developed.
  • The method utilized five criteria and fourteen simulation cases with synthetic datasets.
  • 31 instrument/instrument variants were tested.

Main Results:

  • Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were identified as the most robust metrics.
  • MAE is recommended for general purposes due to its interpretability and [0, 1] range.
  • RMSE is preferred when emphasizing larger errors.

Conclusions:

  • Researchers should carefully select robust probabilistic metrics for binary classification performance evaluation.
  • Metrics like LogLoss, MAPE, sMAPE, and MRAE demonstrated lower robustness and should be avoided.