Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Kendall's Coefficient of Concordance01:20

Kendall's Coefficient of Concordance

1.3K
Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...
1.3K
Multiple Comparison Tests01:13

Multiple Comparison Tests

3.4K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
3.4K
Reliability and Validity01:29

Reliability and Validity

12.9K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
12.9K
Measures of Intelligence01:29

Measures of Intelligence

13.0K
Psychologists measure intelligence by using standardized tests that produce a score known as the intelligence quotient or IQ. To understand IQ tests, it's important to recognize the key principles behind their construction: validity, reliability, and standardization.
Validity refers to how well a test measures what it claims to measure. An intelligence test should accurately assess intelligence rather than another characteristic, like anxiety. Criterion validity is one way to evaluate this;...
13.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Corrigendum to "Temporal Trends in Large Language Model (LLM) Accuracy: A Meta-Analysis of Multiple-Choice Question Performance in Dentistry and Dental Education" [Journal of Dentistry 171 (2026) 106724].

Journal of dentistry·2026
Same author

The performance of ChatGPT and other large language models on multiple-choice questions in biomedical disciplines: A meta-analysis.

Anatomical sciences education·2026
Same author

Using Generative AI to Appraise the Quality of Medical Education Research Studies: Agreement Between AI-Generated and Human MERSQI Scores.

AEM education and training·2026
Same author

Temporal trends in large language model (LLM) accuracy: A meta-analysis of multiple-choice question performance in dentistry and dental education.

Journal of dentistry·2026
Same author

Correlating Prematriculation Metrics With Preclerkship and Board Examination Performance: A Systematic Review and Meta-Analysis of Outcomes.

Medical science educator·2026
Same author

A Survey Evaluating Perceptions of Universal Design Practices Among Health Professions Educators.

Medical science educator·2026
Same journal

Reimagining Medical Education Through Abolitionist Praxis.

Teaching and learning in medicine·2026
Same journal

Curriculum Silence and Erasure: A Queer-Theory Analysis of Transgender-Inclusive Health Education in Internal Medicine Residency.

Teaching and learning in medicine·2026
Same journal

Dual Processing and Social Minefields: How Autistic Healthcare Learners Experience Simulation-Based Education.

Teaching and learning in medicine·2026
Same journal

Visual Attunement: A Longitudinal Study of Comics-Based Education in a US Medical School.

Teaching and learning in medicine·2026
Same journal

ACEing Cognitive Integration: Evidence from a Structural Equation Model.

Teaching and learning in medicine·2026
Same journal

Pursuing Anti-Ableism in Medical Education: A Decolonial and Disability Justice Lens.

Teaching and learning in medicine·2026
See all related articles

Related Experiment Video

Updated: May 1, 2026

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure
07:56

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure

Published on: September 19, 2019

11.6K

Analyzing script concordance test scoring methods and items by difficulty and type.

Adam B Wilson1, Gary R Pike, Aloysius J Humbert

  • 1a Department of Surgery , Indiana University , Indianapolis , Indiana , USA.

Teaching and Learning in Medicine
|April 8, 2014
PubMed
Summary
This summary is machine-generated.

Script Concordance Tests (SCTs) effectively measure data interpretation skills, with 5-point scoring methods proving more reliable than 3-point. Performance on these clinical reasoning assessments improves with experience across all difficulty levels.

More Related Videos

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K
A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing
15:00

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

1.2K

Related Experiment Videos

Last Updated: May 1, 2026

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure
07:56

Assessing the Coherence of Parents' Short Narratives Regarding their Child Using the Five-Minute Speech Sample Procedure

Published on: September 19, 2019

11.6K
Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K
A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing
15:00

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

1.2K

Area of Science:

  • Medical Education
  • Clinical Reasoning Assessment
  • Psychometrics

Background:

  • Script Concordance Tests (SCTs) are used to assess data interpretation, a key clinical reasoning skill.
  • Existing research on SCTs is extensive, but best practices and evidence gaps persist.
  • This study evaluates the psychometric properties of SCT scoring methods and their ability to differentiate training levels.

Purpose of the Study:

  • To test the psychometric properties of six different SCT scoring methods.
  • To determine if SCT items, categorized by difficulty and type, can distinguish between different medical training levels.

Main Methods:

  • SCT data from problem-solving (SCT-PS; n=522) and emergency medicine (SCT-EM; n=1,040) were analyzed.
  • Item analyses were conducted, and items were categorized by difficulty and type.
  • Statistical analyses included correlational analyses and various ANOVAs (MANOVA, repeated measures, one-way).

Main Results:

  • All six scoring methods successfully differentiated between medical training levels.
  • Longitudinal analysis showed MS4s improved significantly from MS2 to MS4 in SCT-PS.
  • Cross-sectional analysis of SCT-EM data revealed significant differences between experienced physicians, residents, and MS4s.

Conclusions:

  • Five-point scoring methods for SCTs provide more reliable data interpretation measures than three-point methods.
  • Data interpretation ability is directly related to experience at all item difficulty levels.
  • Categorizing SCT items by type demonstrated discriminatory power, supporting construct validity.