Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Empirical Method to Interpret Standard Deviation

Empirical Method to Interpret Standard Deviation

The empirical rule, also known as the three-sigma rule, allows a statistician to interpret the standard deviation in a normally distributed dataset. The rule states that 68% of the data lies within one standard deviation from the mean, 95% lies within two standard deviations from the mean, and 99.7% lies within three standard deviations from the mean. Additionally, this rule is also called the 68-95-99.7 rule.
This rule is used widely in statistics to calculate the proportion of data values...

The Sense of Self: Reflected Self-Appraisal and Social Comparison

The Sense of Self: Reflected Self-Appraisal and Social Comparison

According to Charles Cooley, we base our image on what we think other people see (Cooley 1902). We imagine how we must appear to others, then react to this speculation. We don certain clothes, prepare our hair in a particular manner, wear makeup, use cologne, and the like—all with the notion that our presentation of ourselves is going to affect how others perceive us. We expect a certain reaction, and, if lucky, we get the one we desire and feel good about it. But more than that, Cooley...

What are Estimates?

What are Estimates?

It isn't easy to measure a parameter such as the mean height or the mean weight of a population. So, we draw samples from the population and calculate the mean height or mean weight of the individuals in the sample. This sample data acts as a representative measure of the population parameter. These sample statistics are known as estimates.
The estimate for the mean of a sample is denoted by ͞x, whereas the mean of the population is designated as μ. Further, parameters such...

Inhaled Medications

Inhaled Medications

Inhaled medications are crucial for managing chronic obstructive pulmonary disease (COPD) and asthma. They are essential for effective treatment and control, ensuring optimal respiratory health and well-being. Inhaled medication delivers drugs directly to the lungs, providing a rapid onset of action and reducing systemic side effects compared to oral or injectable medications. Three primary types of inhalation devices are used to administer these medications: nebulizers, metered-dose inhalers...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The impact of differential item functioning on ability estimation using the Korean Medical Licensing Examination with computerized adaptive testing: a post-hoc simulation study.

Journal of educational evaluation for health professions·2026

Same author

The impact of negative emotions on adolescents' nonsuicidal self-injury thoughts: an integrated application of machine learning and multilevel logistic models.

PloS one·2025

Same author

Reference values for the PROMIS<sup>®</sup> physical function item bank version 2.0 in the general population: a multinational comparison study (Korea, Netherlands, and US).

BMC public health·2025

Same author

Feasibility of applying computerized adaptive testing to the Clinical Medical Science Comprehensive Examination in Korea: a psychometric study.

Journal of educational evaluation for health professions·2025

Same author

Prediction of delirium occurrence using machine learning in acute stroke patients in intensive care unit.

Frontiers in neuroscience·2025

Same author

A Multinational Comparison Study of the Patient-Reported Outcomes Measurement Information System Anxiety, Depression, and Anger Item Bank in the General Population.

International journal of methods in psychiatric research·2024

Same journal

Characterizing facilitators and barriers to Hypoglycemic Confidence among patients with diabetes: a qualitative descriptive study.

Frontiers in psychology·2026

Same journal

Psychometric evaluation and refinement of the 7DHW questionnaire for the German population.

Frontiers in psychology·2026

Same journal

Editorial: Ethical leadership and workplace equity: mediating and moderating mechanisms in emotional labor and well-being.

Frontiers in psychology·2026

Same journal

How organizational support promotes teacher professional recognition: a perspective on teachers' autonomous learning and teaching abilities.

Frontiers in psychology·2026

Same journal

From "performance competition arena" to "psychological exemption zone": psychological safety mechanisms in reverse mobility.

Frontiers in psychology·2026

Same journal

General and sport-specific mental toughness in university students: associations with personality traits and physical activity.

Frontiers in psychology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 7, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

A Comparison of Three Empirical Reliability Estimates for Computerized Adaptive Testing (CAT) Using a Medical

Dong Gi Seo¹, Sunho Jung²

¹Department of Psychology, Hallym University, Chuncheon, South Korea.

Frontiers in Psychology

|July 14, 2018

Summary

This summary is machine-generated.

Estimating computer adaptive testing (CAT) reliability using different marginalization methods showed that Jensen equality is recommended for accuracy, even with fewer items. Avoid short item counts for CAT to ensure high reliability.

Keywords:

classical test theory computerized adaptive testing item response theory (IRT)measurement reliability

More Related Videos

Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials

Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials

Published on: November 15, 2014

Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI

Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI

Published on: February 19, 2014

Related Experiment Videos

Last Updated: Feb 7, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials

Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials

Published on: November 15, 2014

Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI

Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI

Published on: February 19, 2014

Area of Science:

Psychometrics
Educational Measurement
Statistical Modeling

Background:

Computer Adaptive Testing (CAT) offers efficiency and accuracy over fixed-form tests.
Accurate reliability estimation is crucial for valid CAT results.
Marginalizing observed standard errors (OSEs) is a method to estimate CAT reliability.

Purpose of the Study:

To compare the accuracy of three methods (Arithmetic mean, Harmonic mean, Jensen equality) for estimating CAT reliability.
To evaluate the impact of test length and ability distribution on CAT reliability estimates.
To provide recommendations for optimal CAT reliability estimation and termination criteria.

Main Methods:

Applied Arithmetic mean, Harmonic mean, and Jensen equality to marginalize OSEs for CAT reliability estimation.
Compared empirical CAT reliabilities derived from these methods against true reliabilities.
Analyzed results based on varying test lengths (<40 and >40 items) and mean ability population distribution (zero vs. non-zero).

Main Results:

All three methods underestimated true reliability for short test lengths (<40 items).
When the mean ability population distribution is zero, the magnitude of CAT reliabilities followed Jensen equality > Harmonic mean > Arithmetic mean.
Jensen equality overestimated true reliability for test lengths >40 items with a zero mean ability distribution.
Jensen equality proved closest to true reliability across different conditions and is easily computed using test information at θ = 0.

Conclusions:

Jensen equality is recommended for computing CAT reliability estimates due to its proximity to true reliability, irrespective of test length and mean ability distribution.
Using a small, fixed number of items as a termination criterion for CAT is not advised, particularly for the 2-parameter logistic model (2PLM) and 3-parameter logistic model (3PLM), to maintain high reliability.