Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Comparing Experimental Results: Student's t-Test

Comparing Experimental Results: Student's t-Test

The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

How Extreme Is It Anyways?: An Empirical Investigation Into the Prevalence and Strength of Extreme Response Style.

Educational and psychological measurement·2026

Same author

Balancing stability and flexibility: investigating a dynamic <i>K</i> value approach for the Elo rating system in adaptive learning environments.

User modeling and user-adapted interaction·2025

Same author

Distinguishing Between Models for Extreme and Midpoint Response Styles as Opposite Poles of a Single Dimension versus Two Separate Dimensions: A Simulation Study.

Applied psychological measurement·2025

Same author

Posterior predictive checks for the detection of extreme response style.

Behavior research methods·2025

Same author

Keeping Elo alive: Evaluating and improving measurement properties of learning systems based on Elo ratings.

The British journal of mathematical and statistical psychology·2025

Same author

Modeling Within- and Between-Person Differences in the Use of the Middle Category in Likert Scales.

Applied psychological measurement·2025

Same journal

Characterizing facilitators and barriers to Hypoglycemic Confidence among patients with diabetes: a qualitative descriptive study.

Frontiers in psychology·2026

Same journal

Psychometric evaluation and refinement of the 7DHW questionnaire for the German population.

Frontiers in psychology·2026

Same journal

Editorial: Ethical leadership and workplace equity: mediating and moderating mechanisms in emotional labor and well-being.

Frontiers in psychology·2026

Same journal

How organizational support promotes teacher professional recognition: a perspective on teachers' autonomous learning and teaching abilities.

Frontiers in psychology·2026

Same journal

From "performance competition arena" to "psychological exemption zone": psychological safety mechanisms in reverse mobility.

Frontiers in psychology·2026

Same journal

General and sport-specific mental toughness in university students: associations with personality traits and physical activity.

Frontiers in psychology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 27, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Can IRT Solve the Missing Data Problem in Test Equating?

Maria Bolsinova¹, Gunter Maris²

¹Department of Methodology and Statistics, Utrecht UniversityUtrecht, Netherlands; Psychometric Research Center, Dutch National Institute for Educational Measurement (Cito)Arnhem, Netherlands.

Frontiers in Psychology

|January 19, 2016

Summary

This summary is machine-generated.

Test equating is framed as a missing data challenge. Item response theory (IRT) helps impute unobserved scores, ensuring new cutscores maintain similar failure rates, minimizing bias from non-identifiability.

Keywords:

incomplete design item response theory marginal Rasch model missing data non-identifiability test equating

More Related Videos

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

Related Experiment Videos

Last Updated: Mar 27, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

Area of Science:

Psychometrics
Educational Measurement
Statistics

Background:

Test equating is crucial for score comparability across different test forms.
Traditional equating methods can be sensitive to assumptions about score distributions.
Viewing equating as a missing data problem offers a novel perspective.

Purpose of the Study:

To investigate the use of item response theory (IRT) for imputing missing responses in test equating.
To assess the identifiability of score distributions without parametric assumptions on ability.
To evaluate potential biases in test equating when non-identifiability is ignored.

Main Methods:

Framing test equating as a missing data imputation problem.
Utilizing item response theory (IRT) to model response data.
Investigating identifiability of score distributions from observed data.
Simulating and analyzing empirical data to assess bias.

Main Results:

IRT can impute unobserved responses for test equating.
The score distribution on the new test is not fully identifiable but uncertainty is minimal.
Ignoring non-identifiability and assuming normal ability can introduce bias.

Conclusions:

Test equating can be effectively treated as a missing data problem using IRT.
The proposed IRT approach minimizes bias in equating by addressing unobserved responses.
Careful consideration of identifiability is necessary to avoid potential biases in educational measurement.