Search research articles

Related Concept Videos

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Testing a Claim about Standard Deviation

Testing a Claim about Standard Deviation

A complete procedure to test a claim about population standard deviation or population variance is explained here.
The hypothesis testing for the claim of population standard deviation (or variance) requires the data and samples to be random and unbiased. The population distribution also must be normal. There is no specific requirement on the sample size as the estimation is based on the chi-square distribution.
As a first step, the hypothesis (null and alternative) concerning the claim about...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Ratio Level of Measurement

Ratio Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
A set of data measured using the ratio scale takes care of the ratio problem and provides complete information. Ratio scale data are like interval scale data, except they have a zero point and ratios can be calculated....

Introduction to Normal Distributions

Introduction to Normal Distributions

Standardized test scores often follow a symmetric distribution that can be modeled with the normal distribution, a fundamental concept in statistics. This distribution is particularly useful for interpreting test performance fairly across populations, as it provides a mathematical framework for understanding variability and central tendency in large datasets.From Histogram to Frequency DistributionRaw test data are often displayed using histograms, where the height of each bar represents the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Differential Item Functioning via Robust Scaling.

Psychometrika·2024

Same author

Measuring early learning and development across cultures: Invariance of the IDELA across five countries.

Developmental psychology·2018

Same author

Psychometric Models of Small Group Collaborations.

Psychometrika·2018

Same journal

BAYESIAN MIXED MULTIDIMENSIONAL SCALING FOR AUDITORY PROCESSING.

Psychometrika·2026

Same journal

Testing linear hypotheses in repeated measures generalized linear models using external information.

Psychometrika·2026

Same journal

When Do Unifactorial Items Increase the Reliability?

Psychometrika·2026

Same journal

Longitudinal Designs for Diagnostic Models: Identification and Estimation.

Psychometrika·2026

Same journal

Modeling Rare Events and Nonmonotone Nonignorable Missingness of Time-Varying Outcomes and Predictors in Binary Time-Series Daily Diary Data: A Bayesian Selection Model.

Psychometrika·2026

Same journal

Revelle's Beta: The Wait Is Over-Computation Becomes Possible.

Psychometrika·2026

See all related articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Feb 26, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Differential Item Functioning via Robust Scaling.

Peter F Halpin¹

¹University of North Carolina at Chapel Hill.

|February 25, 2026

Summary

This summary is machine-generated.

This study introduces a novel method for detecting differential item functioning (DIF) in item response theory (IRT) models without needing anchor items. The approach reformulates DIF as outlier detection using robust statistics, offering a more flexible and effective analysis.

Keywords:

differential item functioning item response theory robust statistics test scaling and equating

More Related Videos

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Published on: August 29, 2025

Related Experiment Videos

Last Updated: Feb 26, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Published on: August 29, 2025

Area of Science:

Psychometrics
Educational Measurement
Statistics

Background:

Differential Item Functioning (DIF) is crucial for test fairness.
Current DIF detection methods often require pre-specified anchor items, limiting their applicability.
Item Response Theory (IRT) provides a framework for analyzing item and person characteristics.

Purpose of the Study:

To propose a novel method for assessing DIF in IRT models.
To develop a DIF detection approach that does not require anchor items.
To enhance the robustness and efficiency of DIF analysis.

Main Methods:

Re-formulating DIF as an outlier detection problem within IRT scaling.
Utilizing robust statistics, specifically a redescending M-estimator, for parameter estimation.
Tuning the estimator to control the asymptotic type I error rate for DIF detection.

Main Results:

The proposed redescending M-estimator demonstrates efficiency in the absence of DIF and robustness in its presence.
Simulation studies indicate favorable comparisons with existing DIF detection methods.
A real data example showcases the method's practical application where anchor items are not feasible.

Conclusions:

The proposed method offers a viable alternative for DIF assessment, particularly when anchor items are unavailable.
This robust statistical approach enhances the reliability of DIF detection in IRT.
The findings have implications for improving the fairness and validity of educational and psychological assessments.