Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Kendall's Coefficient of Concordance

Kendall's Coefficient of Concordance

Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Measures of Intelligence

Measures of Intelligence

Psychologists measure intelligence by using standardized tests that produce a score known as the intelligence quotient or IQ. To understand IQ tests, it's important to recognize the key principles behind their construction: validity, reliability, and standardization.
Validity refers to how well a test measures what it claims to measure. An intelligence test should accurately assess intelligence rather than another characteristic, like anxiety. Criterion validity is one way to evaluate this;...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Corrigendum to "Temporal Trends in Large Language Model (LLM) Accuracy: A Meta-Analysis of Multiple-Choice Question Performance in Dentistry and Dental Education" [Journal of Dentistry 171 (2026) 106724].

Journal of dentistry·2026

Same author

The performance of ChatGPT and other large language models on multiple-choice questions in biomedical disciplines: A meta-analysis.

Anatomical sciences education·2026

Same author

Using Generative AI to Appraise the Quality of Medical Education Research Studies: Agreement Between AI-Generated and Human MERSQI Scores.

AEM education and training·2026

Same author

Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education.

Journal of dentistry·2026

Same author

Correlating Prematriculation Metrics With Preclerkship and Board Examination Performance: A Systematic Review and Meta-Analysis of Outcomes.

Medical science educator·2026

Same author

A Survey Evaluating Perceptions of Universal Design Practices Among Health Professions Educators.

Medical science educator·2026

Same journal

Reimagining Medical Education Through Abolitionist Praxis.

Teaching and learning in medicine·2026

Same journal

Curriculum Silence and Erasure: A Queer-Theory Analysis of Transgender-Inclusive Health Education in Internal Medicine Residency.

Teaching and learning in medicine·2026

Same journal

Dual Processing and Social Minefields: How Autistic Healthcare Learners Experience Simulation-Based Education.

Teaching and learning in medicine·2026

Same journal

Visual Attunement: A Longitudinal Study of Comics-Based Education in a US Medical School.

Teaching and learning in medicine·2026

Same journal

ACEing Cognitive Integration: Evidence from a Structural Equation Model.

Teaching and learning in medicine·2026

Same journal

Pursuing Anti-Ableism in Medical Education: A Decolonial and Disability Justice Lens.

Teaching and learning in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 1, 2026

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure

Published on: September 19, 2019

Analyzing script concordance test scoring methods and items by difficulty and type.

Adam B Wilson¹, Gary R Pike, Aloysius J Humbert

¹a Department of Surgery , Indiana University , Indianapolis , Indiana , USA.

Teaching and Learning in Medicine

|April 8, 2014

Summary

This summary is machine-generated.

Script Concordance Tests (SCTs) effectively measure data interpretation skills, with 5-point scoring methods proving more reliable than 3-point. Performance on these clinical reasoning assessments improves with experience across all difficulty levels.

More Related Videos

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Related Experiment Videos

Last Updated: May 1, 2026

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure

Published on: September 19, 2019

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Area of Science:

Medical Education
Clinical Reasoning Assessment
Psychometrics

Background:

Script Concordance Tests (SCTs) are used to assess data interpretation, a key clinical reasoning skill.
Existing research on SCTs is extensive, but best practices and evidence gaps persist.
This study evaluates the psychometric properties of SCT scoring methods and their ability to differentiate training levels.

Purpose of the Study:

To test the psychometric properties of six different SCT scoring methods.
To determine if SCT items, categorized by difficulty and type, can distinguish between different medical training levels.

Main Methods:

SCT data from problem-solving (SCT-PS; n=522) and emergency medicine (SCT-EM; n=1,040) were analyzed.
Item analyses were conducted, and items were categorized by difficulty and type.
Statistical analyses included correlational analyses and various ANOVAs (MANOVA, repeated measures, one-way).

Main Results:

All six scoring methods successfully differentiated between medical training levels.
Longitudinal analysis showed MS4s improved significantly from MS2 to MS4 in SCT-PS.
Cross-sectional analysis of SCT-EM data revealed significant differences between experienced physicians, residents, and MS4s.

Conclusions:

Five-point scoring methods for SCTs provide more reliable data interpretation measures than three-point methods.
Data interpretation ability is directly related to experience at all item difficulty levels.
Categorizing SCT items by type demonstrated discriminatory power, supporting construct validity.