Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Comparison Tests01:13

Multiple Comparison Tests

4.5K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
4.5K
Bonferroni Test01:10

Bonferroni Test

3.4K
The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...
3.4K
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

514
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
514
Self-Report Tests of Personality01:22

Self-Report Tests of Personality

891
Self-report inventories are objective personality assessments that use multiple-choice items or numbered scales, typically ranging from 1 (strongly disagree) to 5 (strongly agree). They are often called Likert scales after Rensis Likert. These inventories are widely used due to their ease of administration and cost-effectiveness. One of the most prominent examples is the Minnesota Multiphasic Personality Inventory (MMPI), initially developed in the 1940s to assess abnormal personality traits.
891
Sensitivity, Specificity, and Predicted Value01:13

Sensitivity, Specificity, and Predicted Value

1.4K
In healthcare diagnostics, laboratory tests play a crucial role in identifying and diagnosing a wide range of medical conditions. However, interpreting test results is not always straightforward. An abnormal test result does not always confirm the presence of a disease, just as a normal result does not guarantee its absence. To assess the reliability of these diagnostic tools, healthcare practitioners rely on two key statistical indicators: sensitivity and specificity.
Sensitivity is the...
1.4K
Comparing Experimental Results: Student's t-Test01:09

Comparing Experimental Results: Student's t-Test

6.1K
The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...
6.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Contemporary cinemeducation: Transdisciplinary exchange at Locarno Film Festival.

GMS journal for medical education·2026
Same author

National Competence Based Catalogue of Learning Objectives for Undergraduate Medical Education (NKLM) - process description over the last 15 years.

GMS journal for medical education·2026
Same author

The interplay of research and teaching for better medical care: Balance and dynamic development.

GMS journal for medical education·2026
Same author

Auditory cueing in spatial neglect: Effects on visual search depend on auditory spatial performance.

Neuropsychologia·2026
Same author

Interprofessional Training in Virtual Reality for Health Care: Experimental Study on Procedural Knowledge and Willingness to Collaborate.

JMIR medical education·2026
Same author

Moderately positive educational environment with opportunities for improvement: a national study of Ethiopian medical schools using the Dundee Ready Education Environment Measure (DREEM), 2024.

BMC medical education·2026
Same journal

Historicizing health professions education research: history as a strategic analytic resource.

Advances in health sciences education : theory and practice·2026
Same journal

Factors influencing workplace collaboration and learning between resident doctors and senior nurses: a scoping review.

Advances in health sciences education : theory and practice·2026
Same journal

Grave lessons: a guide to historiographical research in health professions education.

Advances in health sciences education : theory and practice·2026
Same journal

A scoping review of inquiry in medicine and nursing using Dewey's perspective.

Advances in health sciences education : theory and practice·2026
Same journal

"Teaching is important, but it's not important": a qualitative study of teacher identity among early-career academics in health professions.

Advances in health sciences education : theory and practice·2026
Same journal

The Moral Competence Test in healthcare education: a critique based on informal logic.

Advances in health sciences education : theory and practice·2026
See all related articles

Related Experiment Video

Updated: Feb 17, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K

Multiple true-false items: a comparison of scoring algorithms.

Felicitas-Maria Lahner1, Andrea Carolin Lörwald2, Daniel Bauer3

  • 1Department of Assessment and Evaluation (AAE), Institute of Medical Education, University of Bern, Konsumstr 13, 3010, Bern, Switzerland. felicitas-maria.lahner@iml.unibe.ch.

Advances in Health Sciences Education : Theory and Practice
|December 1, 2017
PubMed
Summary
This summary is machine-generated.

Partial credit scoring, particularly the PS50 algorithm, offers superior reliability and item discrimination for multiple true-false (MTF) items compared to dichotomous scoring and Type A multiple-choice questions. This suggests PS50 is optimal for enhancing assessment quality.

Keywords:
AssessmentMedical educationMultiple choiceMultiple true–falseScoringUndergraduates

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
An Experimental Analysis of Children's Ability to Provide a False Report about a Crime
07:36

An Experimental Analysis of Children's Ability to Provide a False Report about a Crime

Published on: May 3, 2016

9.1K

Related Experiment Videos

Last Updated: Feb 17, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
An Experimental Analysis of Children's Ability to Provide a False Report about a Crime
07:36

An Experimental Analysis of Children's Ability to Provide a False Report about a Crime

Published on: May 3, 2016

9.1K

Area of Science:

  • Medical Education
  • Psychometrics
  • Educational Assessment

Background:

  • Multiple true-false (MTF) items are frequently used alongside single-best answer (Type A) multiple-choice questions in assessments.
  • Existing research on optimal scoring algorithms for MTF items has produced conflicting results, necessitating further investigation.
  • The psychometric properties of different scoring algorithms for MTF items require comparison with traditional Type A questions.

Purpose of the Study:

  • To determine the optimal scoring algorithm for MTF items based on reliability, difficulty index, and item discrimination.
  • To compare the psychometric characteristics of various MTF scoring algorithms against Type A questions within the same examinations.

Main Methods:

  • Data from 37 medical exams in 2015, comprising 998 MTF and 2163 Type A items, were analyzed.
  • Repeated measures analyses of variance (rANOVA) were employed to compare scoring algorithms.
  • Scoring algorithms evaluated included dichotomous scoring (DS), PS50 (partial credit for >50% correct), and PS1/n (partial credit per correct T/F).

Main Results:

  • Partial credit scoring algorithms (PS1/n and PS50) demonstrated significantly higher reliability (α=0.75) than dichotomous scoring (α=0.70) and Type A (α=0.72).
  • Fewer items are needed to achieve a reliability of 0.8 with PS1/n (74 items) and PS50 (75 items) compared to DS (103 items) and Type A (87 items).
  • PS1/n and PS50 exhibited higher discrimination indices (r=0.33) than DS (r=0.30) and Type A (r=0.28). PS50 balanced item difficulty effectively.

Conclusions:

  • Partial credit scoring algorithms yield superior psychometric outcomes for MTF items compared to dichotomous scoring.
  • The PS50 algorithm is recommended for scoring MTF items due to its balanced difficulty range and strong psychometric performance.
  • PS50 offers an effective alternative for enhancing the quality of educational assessments using MTF formats.