Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Bonferroni Test

Bonferroni Test

The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Self-Report Tests of Personality

Self-Report Tests of Personality

Self-report inventories are objective personality assessments that use multiple-choice items or numbered scales, typically ranging from 1 (strongly disagree) to 5 (strongly agree). They are often called Likert scales after Rensis Likert. These inventories are widely used due to their ease of administration and cost-effectiveness. One of the most prominent examples is the Minnesota Multiphasic Personality Inventory (MMPI), initially developed in the 1940s to assess abnormal personality traits.

Sensitivity, Specificity, and Predicted Value

Sensitivity, Specificity, and Predicted Value

In healthcare diagnostics, laboratory tests play a crucial role in identifying and diagnosing a wide range of medical conditions. However, interpreting test results is not always straightforward. An abnormal test result does not always confirm the presence of a disease, just as a normal result does not guarantee its absence. To assess the reliability of these diagnostic tools, healthcare practitioners rely on two key statistical indicators: sensitivity and specificity.
Sensitivity is the...

Comparing Experimental Results: Student's t-Test

Comparing Experimental Results: Student's t-Test

The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Contemporary cinemeducation: Transdisciplinary exchange at Locarno Film Festival.

GMS journal for medical education·2026

Same author

National Competence Based Catalogue of Learning Objectives for Undergraduate Medical Education (NKLM) - process description over the last 15 years.

GMS journal for medical education·2026

Same author

The interplay of research and teaching for better medical care: Balance and dynamic development.

GMS journal for medical education·2026

Same author

Auditory cueing in spatial neglect: Effects on visual search depend on auditory spatial performance.

Neuropsychologia·2026

Same author

Interprofessional Training in Virtual Reality for Health Care: Experimental Study on Procedural Knowledge and Willingness to Collaborate.

JMIR medical education·2026

Same author

Moderately positive educational environment with opportunities for improvement: a national study of Ethiopian medical schools using the Dundee Ready Education Environment Measure (DREEM), 2024.

BMC medical education·2026

Same journal

Historicizing health professions education research: history as a strategic analytic resource.

Advances in health sciences education : theory and practice·2026

Same journal

Factors influencing workplace collaboration and learning between resident doctors and senior nurses: a scoping review.

Advances in health sciences education : theory and practice·2026

Same journal

Grave lessons: a guide to historiographical research in health professions education.

Advances in health sciences education : theory and practice·2026

Same journal

A scoping review of inquiry in medicine and nursing using Dewey's perspective.

Advances in health sciences education : theory and practice·2026

Same journal

"Teaching is important, but it's not important": a qualitative study of teacher identity among early-career academics in health professions.

Advances in health sciences education : theory and practice·2026

Same journal

The Moral Competence Test in healthcare education: a critique based on informal logic.

Advances in health sciences education : theory and practice·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 17, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Multiple true-false items: a comparison of scoring algorithms.

Felicitas-Maria Lahner¹, Andrea Carolin Lörwald², Daniel Bauer³

¹Department of Assessment and Evaluation (AAE), Institute of Medical Education, University of Bern, Konsumstr 13, 3010, Bern, Switzerland. felicitas-maria.lahner@iml.unibe.ch.

Advances in Health Sciences Education : Theory and Practice

|December 1, 2017

Summary

This summary is machine-generated.

Partial credit scoring, particularly the PS50 algorithm, offers superior reliability and item discrimination for multiple true-false (MTF) items compared to dichotomous scoring and Type A multiple-choice questions. This suggests PS50 is optimal for enhancing assessment quality.

Keywords:

Assessment Medical education Multiple choice Multiple true–false Scoring Undergraduates

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

An Experimental Analysis of Children's Ability to Provide a False Report about a Crime

An Experimental Analysis of Children's Ability to Provide a False Report about a Crime

Published on: May 3, 2016

Related Experiment Videos

Last Updated: Feb 17, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

An Experimental Analysis of Children's Ability to Provide a False Report about a Crime

An Experimental Analysis of Children's Ability to Provide a False Report about a Crime

Published on: May 3, 2016

Area of Science:

Medical Education
Psychometrics
Educational Assessment

Background:

Multiple true-false (MTF) items are frequently used alongside single-best answer (Type A) multiple-choice questions in assessments.
Existing research on optimal scoring algorithms for MTF items has produced conflicting results, necessitating further investigation.
The psychometric properties of different scoring algorithms for MTF items require comparison with traditional Type A questions.

Purpose of the Study:

To determine the optimal scoring algorithm for MTF items based on reliability, difficulty index, and item discrimination.
To compare the psychometric characteristics of various MTF scoring algorithms against Type A questions within the same examinations.

Main Methods:

Data from 37 medical exams in 2015, comprising 998 MTF and 2163 Type A items, were analyzed.
Repeated measures analyses of variance (rANOVA) were employed to compare scoring algorithms.
Scoring algorithms evaluated included dichotomous scoring (DS), PS50 (partial credit for >50% correct), and PS1/n (partial credit per correct T/F).

Main Results:

Partial credit scoring algorithms (PS1/n and PS50) demonstrated significantly higher reliability (α=0.75) than dichotomous scoring (α=0.70) and Type A (α=0.72).
Fewer items are needed to achieve a reliability of 0.8 with PS1/n (74 items) and PS50 (75 items) compared to DS (103 items) and Type A (87 items).
PS1/n and PS50 exhibited higher discrimination indices (r=0.33) than DS (r=0.30) and Type A (r=0.28). PS50 balanced item difficulty effectively.

Conclusions:

Partial credit scoring algorithms yield superior psychometric outcomes for MTF items compared to dichotomous scoring.
The PS50 algorithm is recommended for scoring MTF items due to its balanced difficulty range and strong psychometric performance.
PS50 offers an effective alternative for enhancing the quality of educational assessments using MTF formats.