Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Comparison Tests01:13

Multiple Comparison Tests

4.5K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
4.5K
Reliability and Validity01:29

Reliability and Validity

14.1K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
14.1K
Empirical Method to Interpret Standard Deviation01:09

Empirical Method to Interpret Standard Deviation

10.2K
The empirical rule, also known as the three-sigma rule, allows a statistician to interpret the standard deviation in a normally distributed dataset. The rule states that 68% of the data lies within one standard deviation from the mean, 95% lies within two standard deviations from the mean, and 99.7% lies within three standard deviations from the mean. Additionally, this rule is also called the 68-95-99.7 rule.
This rule is used widely in statistics to calculate the proportion of data values...
10.2K
The Sense of Self: Reflected Self-Appraisal and Social Comparison02:57

The Sense of Self: Reflected Self-Appraisal and Social Comparison

56.1K
According to Charles Cooley, we base our image on what we think other people see (Cooley 1902). We imagine how we must appear to others, then react to this speculation. We don certain clothes, prepare our hair in a particular manner, wear makeup, use cologne, and the like—all with the notion that our presentation of ourselves is going to affect how others perceive us. We expect a certain reaction, and, if lucky, we get the one we desire and feel good about it. But more than that, Cooley...
56.1K
What are Estimates?01:06

What are Estimates?

8.8K
It isn't easy to measure a parameter such as the mean height or the mean weight of a population. So, we draw samples from the population and calculate the mean height or mean weight of the individuals in the sample. This sample data acts as a representative measure of the population parameter. These sample statistics are known as estimates. 
The estimate for the mean of a sample is denoted by ͞x, whereas the mean of the population is designated as μ. Further, parameters such...
8.8K
Inhaled Medications01:23

Inhaled Medications

806
Inhaled medications are crucial for managing chronic obstructive pulmonary disease (COPD) and asthma. They are essential for effective treatment and control, ensuring optimal respiratory health and well-being. Inhaled medication delivers drugs directly to the lungs, providing a rapid onset of action and reducing systemic side effects compared to oral or injectable medications. Three primary types of inhalation devices are used to administer these medications: nebulizers, metered-dose inhalers...
806

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The impact of differential item functioning on ability estimation using the Korean Medical Licensing Examination with computerized adaptive testing: a post-hoc simulation study.

Journal of educational evaluation for health professions·2026
Same author

The impact of negative emotions on adolescents' nonsuicidal self-injury thoughts: an integrated application of machine learning and multilevel logistic models.

PloS one·2025
Same author

Reference values for the PROMIS<sup>®</sup> physical function item bank version 2.0 in the general population: a multinational comparison study (Korea, Netherlands, and US).

BMC public health·2025
Same author

Feasibility of applying computerized adaptive testing to the Clinical Medical Science Comprehensive Examination in Korea: a psychometric study.

Journal of educational evaluation for health professions·2025
Same author

Prediction of delirium occurrence using machine learning in acute stroke patients in intensive care unit.

Frontiers in neuroscience·2025
Same author

A Multinational Comparison Study of the Patient-Reported Outcomes Measurement Information System Anxiety, Depression, and Anger Item Bank in the General Population.

International journal of methods in psychiatric research·2024
Same journal

Characterizing facilitators and barriers to Hypoglycemic Confidence among patients with diabetes: a qualitative descriptive study.

Frontiers in psychology·2026
Same journal

Psychometric evaluation and refinement of the 7DHW questionnaire for the German population.

Frontiers in psychology·2026
Same journal

Editorial: Ethical leadership and workplace equity: mediating and moderating mechanisms in emotional labor and well-being.

Frontiers in psychology·2026
Same journal

How organizational support promotes teacher professional recognition: a perspective on teachers' autonomous learning and teaching abilities.

Frontiers in psychology·2026
Same journal

From "performance competition arena" to "psychological exemption zone": psychological safety mechanisms in reverse mobility.

Frontiers in psychology·2026
Same journal

General and sport-specific mental toughness in university students: associations with personality traits and physical activity.

Frontiers in psychology·2026
See all related articles

Related Experiment Video

Updated: Feb 7, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K

A Comparison of Three Empirical Reliability Estimates for Computerized Adaptive Testing (CAT) Using a Medical

Dong Gi Seo1, Sunho Jung2

  • 1Department of Psychology, Hallym University, Chuncheon, South Korea.

Frontiers in Psychology
|July 14, 2018
PubMed
Summary
This summary is machine-generated.

Estimating computer adaptive testing (CAT) reliability using different marginalization methods showed that Jensen equality is recommended for accuracy, even with fewer items. Avoid short item counts for CAT to ensure high reliability.

Keywords:
classical test theorycomputerized adaptive testingitem response theory (IRT)measurementreliability

More Related Videos

Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials
09:40

Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials

Published on: November 15, 2014

14.6K
Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI
10:50

Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI

Published on: February 19, 2014

12.0K

Related Experiment Videos

Last Updated: Feb 7, 2026

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K
Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials
09:40

Measuring Neural and Behavioral Activity During Ongoing Computerized Social Interactions: An Examination of Event-Related Brain Potentials

Published on: November 15, 2014

14.6K
Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI
10:50

Functional Imaging of Auditory Cortex in Adult Cats using High-field fMRI

Published on: February 19, 2014

12.0K

Area of Science:

  • Psychometrics
  • Educational Measurement
  • Statistical Modeling

Background:

  • Computer Adaptive Testing (CAT) offers efficiency and accuracy over fixed-form tests.
  • Accurate reliability estimation is crucial for valid CAT results.
  • Marginalizing observed standard errors (OSEs) is a method to estimate CAT reliability.

Purpose of the Study:

  • To compare the accuracy of three methods (Arithmetic mean, Harmonic mean, Jensen equality) for estimating CAT reliability.
  • To evaluate the impact of test length and ability distribution on CAT reliability estimates.
  • To provide recommendations for optimal CAT reliability estimation and termination criteria.

Main Methods:

  • Applied Arithmetic mean, Harmonic mean, and Jensen equality to marginalize OSEs for CAT reliability estimation.
  • Compared empirical CAT reliabilities derived from these methods against true reliabilities.
  • Analyzed results based on varying test lengths (<40 and >40 items) and mean ability population distribution (zero vs. non-zero).

Main Results:

  • All three methods underestimated true reliability for short test lengths (<40 items).
  • When the mean ability population distribution is zero, the magnitude of CAT reliabilities followed Jensen equality > Harmonic mean > Arithmetic mean.
  • Jensen equality overestimated true reliability for test lengths >40 items with a zero mean ability distribution.
  • Jensen equality proved closest to true reliability across different conditions and is easily computed using test information at θ = 0.

Conclusions:

  • Jensen equality is recommended for computing CAT reliability estimates due to its proximity to true reliability, irrespective of test length and mean ability distribution.
  • Using a small, fixed number of items as a termination criterion for CAT is not advised, particularly for the 2-parameter logistic model (2PLM) and 3-parameter logistic model (3PLM), to maintain high reliability.