Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

7.3K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
7.3K
Comparing Experimental Results: Student's t-Test01:09

Comparing Experimental Results: Student's t-Test

6.4K
The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...
6.4K
Test for Homogeneity01:23

Test for Homogeneity

2.5K
The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...
2.5K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

6.9K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
6.9K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.8K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.8K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

4.4K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
4.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

How Extreme Is It Anyways?: An Empirical Investigation Into the Prevalence and Strength of Extreme Response Style.

Educational and psychological measurement·2026
Same author

Balancing stability and flexibility: investigating a dynamic <i>K</i> value approach for the Elo rating system in adaptive learning environments.

User modeling and user-adapted interaction·2025
Same author

Distinguishing Between Models for Extreme and Midpoint Response Styles as Opposite Poles of a Single Dimension versus Two Separate Dimensions: A Simulation Study.

Applied psychological measurement·2025
Same author

Posterior predictive checks for the detection of extreme response style.

Behavior research methods·2025
Same author

Keeping Elo alive: Evaluating and improving measurement properties of learning systems based on Elo ratings.

The British journal of mathematical and statistical psychology·2025
Same author

Modeling Within- and Between-Person Differences in the Use of the Middle Category in Likert Scales.

Applied psychological measurement·2025
Same journal

Characterizing facilitators and barriers to Hypoglycemic Confidence among patients with diabetes: a qualitative descriptive study.

Frontiers in psychology·2026
Same journal

Psychometric evaluation and refinement of the 7DHW questionnaire for the German population.

Frontiers in psychology·2026
Same journal

Editorial: Ethical leadership and workplace equity: mediating and moderating mechanisms in emotional labor and well-being.

Frontiers in psychology·2026
Same journal

How organizational support promotes teacher professional recognition: a perspective on teachers' autonomous learning and teaching abilities.

Frontiers in psychology·2026
Same journal

From "performance competition arena" to "psychological exemption zone": psychological safety mechanisms in reverse mobility.

Frontiers in psychology·2026
Same journal

General and sport-specific mental toughness in university students: associations with personality traits and physical activity.

Frontiers in psychology·2026
See all related articles

Related Experiment Video

Updated: Mar 27, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K

Can IRT Solve the Missing Data Problem in Test Equating?

Maria Bolsinova1, Gunter Maris2

  • 1Department of Methodology and Statistics, Utrecht UniversityUtrecht, Netherlands; Psychometric Research Center, Dutch National Institute for Educational Measurement (Cito)Arnhem, Netherlands.

Frontiers in Psychology
|January 19, 2016
PubMed
Summary
This summary is machine-generated.

Test equating is framed as a missing data challenge. Item response theory (IRT) helps impute unobserved scores, ensuring new cutscores maintain similar failure rates, minimizing bias from non-identifiability.

Keywords:
incomplete designitem response theorymarginal Rasch modelmissing datanon-identifiabilitytest equating

More Related Videos

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

7.3K
Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects
08:13

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

6.9K

Related Experiment Videos

Last Updated: Mar 27, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K
Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

7.3K
Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects
08:13

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

6.9K

Area of Science:

  • Psychometrics
  • Educational Measurement
  • Statistics

Background:

  • Test equating is crucial for score comparability across different test forms.
  • Traditional equating methods can be sensitive to assumptions about score distributions.
  • Viewing equating as a missing data problem offers a novel perspective.

Purpose of the Study:

  • To investigate the use of item response theory (IRT) for imputing missing responses in test equating.
  • To assess the identifiability of score distributions without parametric assumptions on ability.
  • To evaluate potential biases in test equating when non-identifiability is ignored.

Main Methods:

  • Framing test equating as a missing data imputation problem.
  • Utilizing item response theory (IRT) to model response data.
  • Investigating identifiability of score distributions from observed data.
  • Simulating and analyzing empirical data to assess bias.

Main Results:

  • IRT can impute unobserved responses for test equating.
  • The score distribution on the new test is not fully identifiable but uncertainty is minimal.
  • Ignoring non-identifiability and assuming normal ability can introduce bias.

Conclusions:

  • Test equating can be effectively treated as a missing data problem using IRT.
  • The proposed IRT approach minimizes bias in equating by addressing unobserved responses.
  • Careful consideration of identifiability is necessary to avoid potential biases in educational measurement.