Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Kendall's Coefficient of Concordance01:20

Kendall's Coefficient of Concordance

339
Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...
339
Ratio Level of Measurement00:54

Ratio Level of Measurement

18.0K
The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
A set of data measured using the ratio scale takes care of the ratio problem and provides complete information. Ratio scale data are like interval scale data, except they have a zero point and ratios can be calculated....
18.0K
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

198
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
198
Calibration Curves: Correlation Coefficient01:10

Calibration Curves: Correlation Coefficient

1.6K
In a linear calibration curve, there is a value called the calibration coefficient, denoted by 'r,' which measures the strength and the direction of association between two variables. The correlation coefficient value ranges from −1 to +1. A value of +1 indicates a perfect positive linear correlation, −1 denotes a perfect negative correlation, and 0 implies no correlation between the two variables. A positive correlation value establishes that as one variable increases, the...
1.6K
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

6.6K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
6.6K
Ordinal Level of Measurement00:55

Ordinal Level of Measurement

23.7K
The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
Data measured using an ordinal scale are similar to nominal scale data, but there is one major difference. The ordinal scale data can be ordered. An example of ordinal scale data is a list of the top five national parks...
23.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Training Community Health Workers for Diabetes Management in Low- and Middle-Income Countries: Systematic Review.

JMIR diabetes·2026
Same author

From tetrachoric to kappa: How to assess reliability on binary scales.

The British journal of mathematical and statistical psychology·2025
Same author

Sample size determination for hypothesis testing on the intraclass correlation coefficient in a two-way analysis of variance model.

The British journal of mathematical and statistical psychology·2025
Same author

Reliability for multilevel data: A correlation approach.

Psychological methods·2025
Same author

Methodological quality in reliability/agreement studies.

Acta obstetricia et gynecologica Scandinavica·2024
Same author

Review of sample size determination methods for the intraclass correlation coefficient in the one-way analysis of variance model.

Statistical methods in medical research·2024
Same journal

Proficiency order invariance of MLE, MAP, EAP, and WLE in item response theory.

The British journal of mathematical and statistical psychology·2026
Same journal

Bias and precision in true-score estimation.

The British journal of mathematical and statistical psychology·2026
Same journal

Polychoric correlations under the assumption of elliptical latent traits.

The British journal of mathematical and statistical psychology·2026
Same journal

Regularized reduced rank regression for mixed predictor and response variables.

The British journal of mathematical and statistical psychology·2026
Same journal

A multiple-choice SDT model for cognitive diagnosis models.

The British journal of mathematical and statistical psychology·2026
Same journal

Modular item response and structural equation modelling via measurement and uncertainty preserving parametric modelling.

The British journal of mathematical and statistical psychology·2026
See all related articles

Related Experiment Video

Updated: Jul 5, 2025

A Protocol of Manual Tests to Measure Sensation and Pain in Humans
07:28

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

21.0K

Statistical inference for agreement between multiple raters on a binary scale.

Sophie Vanbelle1

  • 1Department of Methodology and Statistics, CAPHRI, Maastricht university, Maastricht, The Netherlands.

The British Journal of Mathematical and Statistical Psychology
|January 17, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces improved statistical methods for agreement studies with multiple raters. New procedures offer better statistical performance and sample size calculations for reliable agreement analysis.

Keywords:
confidence intervalcredibility intervaldichotomousraterssample size

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.5K
Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice
06:11

Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice

Published on: October 19, 2019

19.9K

Related Experiment Videos

Last Updated: Jul 5, 2025

A Protocol of Manual Tests to Measure Sensation and Pain in Humans
07:28

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

21.0K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.5K
Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice
06:11

Nest Building Behavior as an Early Indicator of Behavioral Deficits in Mice

Published on: October 19, 2019

19.9K

Area of Science:

  • Statistics
  • Biostatistics
  • Psychometrics

Background:

  • Agreement studies are crucial for assessing reliability in various fields.
  • Traditional agreement measures often struggle with more than two raters or repeated measures.
  • Existing methods may lack robust statistical inference procedures for complex agreement scenarios.

Purpose of the Study:

  • To generalize agreement measures for binary scales to studies with multiple raters.
  • To propose and evaluate novel statistical inference procedures for enhanced agreement analysis.
  • To provide tools for determining optimal sample sizes in multi-rater agreement studies.

Main Methods:

  • Development of Wald confidence intervals using the delta method for standard error estimation.
  • Implementation of Bayesian statistical inference without requiring specialized Bayesian software.
  • Derivation of analytical formulas for sample size determination based on the number of raters.

Main Results:

  • The proposed Wald and Bayesian methods demonstrate superior statistical behavior compared to previously suggested confidence intervals.
  • New procedures offer more reliable agreement assessment in multi-rater settings.
  • Analytical formulas facilitate efficient study planning by determining the minimum required sample size.

Conclusions:

  • The novel statistical inference procedures provide a more robust framework for analyzing agreement in studies with multiple raters.
  • The developed methods and accompanying R package (simpleagree) and Shiny app enhance the practical application of agreement studies.
  • This work contributes to more accurate and reliable assessment of inter-rater or repeated measure agreement.