Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can be stated as...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

The Anderson-Darling Test

The Anderson-Darling Test

The Anderson-Darling test is a statistical method used to determine whether a data sample is likely drawn from a specific theoretical distribution. Unlike parametric tests, it does not require assumptions about specific parameters of the distribution. Instead, it compares the sample's empirical cumulative distribution function (ECDF) with the cumulative distribution function (CDF) of the hypothesized distribution. Critical values for the test are specific to the chosen distribution rather than...

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with data...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Detecting Test Speededness Using Responses and/or Response Times: Change Point Analysis Approaches Based on Schwarz Information Criterion.

Psychometrika·2026

Same author

Using multilabel classification neural network to detect intersectional DIF with small sample sizes.

The British journal of mathematical and statistical psychology·2026

Same author

A multi-strategy cognitive diagnosis model based on response times and fixation counts.

Behavior research methods·2026

Same author

A Diagnostic Facet Status Model (DFSM) for Extracting Instructionally Useful Information from Diagnostic Assessment.

Psychometrika·2026

Same author

Calibrating Multidimensional Assessments With Structural Missingness: An Application of a Multiple-Group Higher-Order IRT Model.

Applied psychological measurement·2026

Same author

Robot-Assisted Dynamic Interaction of Hemiplegic Upper Limbs with Complex Objects Based on Enhanced Feedforward-Impedance Control.

Biomimetics (Basel, Switzerland)·2025

Same journal

babebi: An R Package for Bayesian Estimation and Validation in Small-N Two-Rater Pre-Post Designs.

Applied psychological measurement·2026

Same journal

A Tool for Agreement and Alignment Analysis in Binary Rating Tasks: The R Package scindex.

Applied psychological measurement·2026

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 14, 2026

A Tactile Automated Passive-Finger Stimulator TAPS

A Tactile Automated Passive-Finger Stimulator TAPS

Published on: June 3, 2009

Detecting uniform differential item functioning for continuous response computerized adaptive testing.

Chun Wang¹, Ruoyi Zhu¹

¹University of Washington, WA, USA.

Applied Psychological Measurement

|February 8, 2024

Summary

This summary is machine-generated.

We developed two methods to detect differential item functioning (DIF) in computerized adaptive testing (CAT) with continuous responses and sparse data. Both methods effectively identified uniform DIF, ensuring fair measurement in advanced testing scenarios.

Keywords:

SIBTEST computerized adaptive test continuous response differential item functioning

More Related Videos

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Related Experiment Videos

Last Updated: May 14, 2026

A Tactile Automated Passive-Finger Stimulator TAPS

A Tactile Automated Passive-Finger Stimulator TAPS

Published on: June 3, 2009

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Area of Science:

Psychometrics
Educational Measurement
Computerized Adaptive Testing (CAT)

Background:

Ensuring measurement fairness requires evaluating items for differential item functioning (DIF).
Continuous response items offer more information than dichotomous items, particularly in performance-based tasks.
Severe data sparsity is common in computerized adaptive testing (CAT) when items are machine-generated.

Purpose of the Study:

To propose and evaluate two novel methods for detecting uniform DIF in the specific context of continuous response, severely sparse CAT.
To assess the effectiveness of these methods in identifying DIF under challenging data conditions.

Main Methods:

A modified non-parametric CAT-SIBTEST method, independent of item response theory (IRT) model assumptions.
A parametric, model-based regularization method.
Simulation studies were conducted to evaluate method performance.

Main Results:

Both proposed methods demonstrated effectiveness in accurately identifying items exhibiting uniform DIF.
The simulation studies confirmed the robustness of the developed techniques in the specified CAT scenario.

Conclusions:

The developed CAT-SIBTEST modification and regularization method are suitable for detecting uniform DIF in continuous response, severely sparse CAT.
These methods contribute to ensuring measurement fairness in advanced, data-intensive testing environments.
A real data analysis is presented to illustrate practical application and potential limitations.