Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Kendall's Coefficient of Concordance

Kendall's Coefficient of Concordance

Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...

Ratio Level of Measurement

Ratio Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
A set of data measured using the ratio scale takes care of the ratio problem and provides complete information. Ratio scale data are like interval scale data, except they have a zero point and ratios can be calculated....

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Calibration Curves: Correlation Coefficient

Calibration Curves: Correlation Coefficient

In a linear calibration curve, there is a value called the calibration coefficient, denoted by 'r,' which measures the strength and the direction of association between two variables. The correlation coefficient value ranges from −1 to +1. A value of +1 indicates a perfect positive linear correlation, −1 denotes a perfect negative correlation, and 0 implies no correlation between the two variables. A positive correlation value establishes that as one variable increases, the...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Ordinal Level of Measurement

Ordinal Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
Data measured using an ordinal scale are similar to nominal scale data, but there is one major difference. The ordinal scale data can be ordered. An example of ordinal scale data is a list of the top five national parks...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Training Community Health Workers for Diabetes Management in Low- and Middle-Income Countries: Systematic Review.

JMIR diabetes·2026

Same author

From tetrachoric to kappa: How to assess reliability on binary scales.

The British journal of mathematical and statistical psychology·2025

Same author

Sample size determination for hypothesis testing on the intraclass correlation coefficient in a two-way analysis of variance model.

The British journal of mathematical and statistical psychology·2025

Same author

Reliability for multilevel data: A correlation approach.

Psychological methods·2025

Same author

Methodological quality in reliability/agreement studies.

Acta obstetricia et gynecologica Scandinavica·2024

Same author

Review of sample size determination methods for the intraclass correlation coefficient in the one-way analysis of variance model.

Statistical methods in medical research·2024

Same journal

Proficiency order invariance of MLE, MAP, EAP, and WLE in item response theory.

The British journal of mathematical and statistical psychology·2026

Same journal

Bias and precision in true-score estimation.

The British journal of mathematical and statistical psychology·2026

Same journal

Polychoric correlations under the assumption of elliptical latent traits.

The British journal of mathematical and statistical psychology·2026

Same journal

Regularized reduced rank regression for mixed predictor and response variables.

The British journal of mathematical and statistical psychology·2026

Same journal

A multiple-choice SDT model for cognitive diagnosis models.

The British journal of mathematical and statistical psychology·2026

Same journal

Modular item response and structural equation modelling via measurement and uncertainty preserving parametric modelling.

The British journal of mathematical and statistical psychology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 5, 2025

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

Statistical inference for agreement between multiple raters on a binary scale.

Sophie Vanbelle¹

¹Department of Methodology and Statistics, CAPHRI, Maastricht university, Maastricht, The Netherlands.

The British Journal of Mathematical and Statistical Psychology

|January 17, 2024

Summary

This summary is machine-generated.

This study introduces improved statistical methods for agreement studies with multiple raters. New procedures offer better statistical performance and sample size calculations for reliable agreement analysis.

Keywords:

confidence interval credibility interval dichotomous raters sample size

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice

Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice

Published on: October 19, 2019

Related Experiment Videos

Last Updated: Jul 5, 2025

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice

Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice

Published on: October 19, 2019

Area of Science:

Statistics
Biostatistics
Psychometrics

Background:

Agreement studies are crucial for assessing reliability in various fields.
Traditional agreement measures often struggle with more than two raters or repeated measures.
Existing methods may lack robust statistical inference procedures for complex agreement scenarios.

Purpose of the Study:

To generalize agreement measures for binary scales to studies with multiple raters.
To propose and evaluate novel statistical inference procedures for enhanced agreement analysis.
To provide tools for determining optimal sample sizes in multi-rater agreement studies.

Main Methods:

Development of Wald confidence intervals using the delta method for standard error estimation.
Implementation of Bayesian statistical inference without requiring specialized Bayesian software.
Derivation of analytical formulas for sample size determination based on the number of raters.

Main Results:

The proposed Wald and Bayesian methods demonstrate superior statistical behavior compared to previously suggested confidence intervals.
New procedures offer more reliable agreement assessment in multi-rater settings.
Analytical formulas facilitate efficient study planning by determining the minimum required sample size.

Conclusions:

The novel statistical inference procedures provide a more robust framework for analyzing agreement in studies with multiple raters.
The developed methods and accompanying R package (simpleagree) and Shiny app enhance the practical application of agreement studies.
This work contributes to more accurate and reliable assessment of inter-rater or repeated measure agreement.