Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation01:24

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

620
This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...
620
Comparing Experimental Results: Student's t-Test01:09

Comparing Experimental Results: Student's t-Test

1.6K
The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...
1.6K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.6K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

3.4K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
3.4K
Response Surface Methodology01:16

Response Surface Methodology

205
Response Surface Methodology (RSM) is a collection of statistical and mathematical techniques used to develop, improve, and optimize processes. It is particularly valuable when many input variables or factors potentially influence a response variable.
The process of RSM involves several key steps:
205
Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test01:09

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

1.7K
In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...
1.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Developing and validating a frailty score based on patient-reported outcome 3 months after stroke: A Riksstroke-based study.

PloS one·2026
Same author

Differential item functioning detection across multiple groups.

The British journal of mathematical and statistical psychology·2025
Same author

The bit scale: A metric score scale for unidimensional item response theory models.

Psychometrika·2025
Same author

Combining Propensity Scores and Common Items for Test Score Equating.

Applied psychological measurement·2025
Same author

Calculating Bias in Test Score Equating in a NEAT Design.

Applied psychological measurement·2025
Same author

An Information Manifold Perspective for Analyzing Test Data.

Applied psychological measurement·2024
Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026
Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026
Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026
Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026
Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurement·2026
Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurement·2026
See all related articles

Related Experiment Video

Updated: Aug 8, 2025

A Two-interval Forced-choice Task for Multisensory Comparisons
07:13

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

11.0K

Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

Waldir Leôncio1,2, Marie Wiberg3, Michela Battauz4

  • 1Department of Statistical Sciences, University of Padua, Padua, Italy.

Applied Psychological Measurement
|March 6, 2023
PubMed
Summary
This summary is machine-generated.

This study compares Item Response Theory (IRT) Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE) for test score comparability. IRT methods generally outperform KE, especially when data deviate from IRT assumptions, though KE offers speed advantages.

Keywords:
classical test theoryequatingitem response theorypsychometricssimulationstatistics

More Related Videos

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing
15:00

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

677
Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE
06:57

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Published on: May 14, 2019

10.6K

Related Experiment Videos

Last Updated: Aug 8, 2025

A Two-interval Forced-choice Task for Multisensory Comparisons
07:13

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

11.0K
A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing
15:00

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

677
Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE
06:57

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Published on: May 14, 2019

10.6K

Area of Science:

  • Psychometrics
  • Educational Measurement
  • Statistical Modeling

Background:

  • Test equating is crucial for score comparability across different test forms.
  • Existing equating methods are based on Classical Test Theory (CTT) and Item Response Theory (IRT) frameworks.
  • Comparing different equating methodologies is essential for understanding their performance and applicability.

Purpose of the Study:

  • To compare the performance of three equating transformations: IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE).
  • To evaluate these methods under various data-generating scenarios, including a novel simulation procedure.
  • To assess the impact of data properties like distribution skewness and item difficulty on equating accuracy.

Main Methods:

  • Comparison of equating transformations derived from IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE).
  • Development of a new data-generation procedure for simulating test data without IRT parameters.
  • Simulation studies to control for test score properties such as distribution skewness and item difficulty.

Main Results:

  • Item Response Theory (IRT) methods generally yielded superior results compared to Kernel Equating (KE), even with non-IRT generated data.
  • Kernel Equating (KE) showed potential for satisfactory results with appropriate pre-smoothing, offering significant speed advantages over IRT methods.
  • The choice of equating method impacts results; sensitivity analyses are recommended for practical applications.

Conclusions:

  • IRT-based equating methods are often more robust, particularly when data assumptions are not fully met.
  • Kernel Equating (KE) can be a viable, faster alternative if pre-smoothing techniques are effectively implemented.
  • For practical test equating, it is vital to consider model fit, assumption adherence, and the sensitivity of results to the chosen methodology.