Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

Comparing Experimental Results: Student's t-Test

Comparing Experimental Results: Student's t-Test

The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Response Surface Methodology

Response Surface Methodology

Response Surface Methodology (RSM) is a collection of statistical and mathematical techniques used to develop, improve, and optimize processes. It is particularly valuable when many input variables or factors potentially influence a response variable.
The process of RSM involves several key steps:

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Developing and validating a frailty score based on patient-reported outcome 3 months after stroke: A Riksstroke-based study.

PloS one·2026

Same author

Differential item functioning detection across multiple groups.

The British journal of mathematical and statistical psychology·2025

Same author

The bit scale: A metric score scale for unidimensional item response theory models.

Psychometrika·2025

Same author

Combining Propensity Scores and Common Items for Test Score Equating.

Applied psychological measurement·2025

Same author

Calculating Bias in Test Score Equating in a NEAT Design.

Applied psychological measurement·2025

Same author

An Information Manifold Perspective for Analyzing Test Data.

Applied psychological measurement·2024

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurement·2026

Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 8, 2025

A Two-interval Forced-choice Task for Multisensory Comparisons

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

Waldir Leôncio^1,2, Marie Wiberg³, Michela Battauz⁴

¹Department of Statistical Sciences, University of Padua, Padua, Italy.

Applied Psychological Measurement

|March 6, 2023

Summary

This summary is machine-generated.

This study compares Item Response Theory (IRT) Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE) for test score comparability. IRT methods generally outperform KE, especially when data deviate from IRT assumptions, though KE offers speed advantages.

Keywords:

classical test theory equating item response theory psychometrics simulation statistics

More Related Videos

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Published on: May 14, 2019

Related Experiment Videos

Last Updated: Aug 8, 2025

A Two-interval Forced-choice Task for Multisensory Comparisons

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Published on: May 14, 2019

Area of Science:

Psychometrics
Educational Measurement
Statistical Modeling

Background:

Test equating is crucial for score comparability across different test forms.
Existing equating methods are based on Classical Test Theory (CTT) and Item Response Theory (IRT) frameworks.
Comparing different equating methodologies is essential for understanding their performance and applicability.

Purpose of the Study:

To compare the performance of three equating transformations: IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE).
To evaluate these methods under various data-generating scenarios, including a novel simulation procedure.
To assess the impact of data properties like distribution skewness and item difficulty on equating accuracy.

Main Methods:

Comparison of equating transformations derived from IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE).
Development of a new data-generation procedure for simulating test data without IRT parameters.
Simulation studies to control for test score properties such as distribution skewness and item difficulty.

Main Results:

Item Response Theory (IRT) methods generally yielded superior results compared to Kernel Equating (KE), even with non-IRT generated data.
Kernel Equating (KE) showed potential for satisfactory results with appropriate pre-smoothing, offering significant speed advantages over IRT methods.
The choice of equating method impacts results; sensitivity analyses are recommended for practical applications.

Conclusions:

IRT-based equating methods are often more robust, particularly when data assumptions are not fully met.
Kernel Equating (KE) can be a viable, faster alternative if pre-smoothing techniques are effectively implemented.
For practical test equating, it is vital to consider model fit, assumption adherence, and the sensitivity of results to the chosen methodology.