Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Bonferroni Test

Bonferroni Test

The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Fisher's Exact Test

Fisher's Exact Test

Fisher's exact test is a statistical significance test widely used to analyze 2x2 contingency tables, particularly in situations where sample sizes are small. Unlike the chi-squared test, which approximates P-values and assumes minimum expected frequencies of at least five in each cell, Fisher's exact test calculates the exact probability (P-value) of observing the data or more extreme results under the null hypothesis. This feature makes it especially valuable when the assumptions of...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Self-Report Tests of Personality

Self-Report Tests of Personality

Self-report inventories are objective personality assessments that use multiple-choice items or numbered scales, typically ranging from 1 (strongly disagree) to 5 (strongly agree). They are often called Likert scales after Rensis Likert. These inventories are widely used due to their ease of administration and cost-effectiveness. One of the most prominent examples is the Minnesota Multiphasic Personality Inventory (MMPI), initially developed in the 1940s to assess abnormal personality traits.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Bayesian evaluation for latent variable models: A tutorial on computing information criteria and bayes factors with the r package bleval.

Psychological methods·2026

Same author

Predictive value of real-time memory tests in identifying alcohol-induced blackouts in situ.

Addiction (Abingdon, England)·2026

Same author

Bayesian Estimation of Normal and Probit Psychometric Models.

Psychometrika·2026

Same author

Identification and Scaling of Latent Variables in Ordinal Factor Analysis.

Psychometrika·2026

Same author

Spaced Repetition Enhances Self-Rated Learning Confidence: A Large Randomized Trial Among Practicing Family Physicians.

The Journal of continuing education in the health professions·2025

Same author

Physical context of alcohol use and craving: An EMA exploratory study.

Addictive behaviors·2025

Same journal

BAYESIAN MIXED MULTIDIMENSIONAL SCALING FOR AUDITORY PROCESSING.

Psychometrika·2026

Same journal

Testing linear hypotheses in repeated measures generalized linear models using external information.

Psychometrika·2026

Same journal

When Do Unifactorial Items Increase the Reliability?

Psychometrika·2026

Same journal

Longitudinal Designs for Diagnostic Models: Identification and Estimation.

Psychometrika·2026

Same journal

Modeling Rare Events and Nonmonotone Nonignorable Missingness of Time-Varying Outcomes and Predictors in Binary Time-Series Daily Diary Data: A Bayesian Selection Model.

Psychometrika·2026

Same journal

Revelle's Beta: The Wait Is Over-Computation Becomes Possible.

Psychometrika·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 18, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Score-Based Tests of Differential Item Functioning via Pairwise Maximum Likelihood Estimation.

Ting Wang¹, Carolin Strobl², Achim Zeileis³

¹Department of Psychological Sciences, University of Missouri, Columbia, MO, USA. twb8d@mail.missouri.edu.

|November 19, 2017

Summary

This summary is machine-generated.

This study introduces novel score-based tests to detect violations of measurement invariance in item response theory models. These tests effectively identify problematic parameters without needing prior information, improving scale interpretation.

Keywords:

differential item functioning item response theory pairwise maximum likelihood score-based test

More Related Videos

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

Related Experiment Videos

Last Updated: Feb 18, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

Area of Science:

Psychometrics
Statistical modeling
Educational measurement

Background:

Measurement invariance is crucial in item response theory (IRT) for accurate latent construct assessment.
Violations can lead to misinterpretations and systematic bias, particularly in diverse populations.
Existing detection methods often require unavailable prior information on item parameters and groups.

Purpose of the Study:

To extend recently developed score-based tests for detecting measurement invariance violations.
To adapt these tests for two-parameter item response models, focusing on pairwise maximum likelihood.
To evaluate the tests' efficacy in identifying problematic item parameters.

Main Methods:

Utilizing score-based tests derived from casewise derivatives of the likelihood function.
Estimating only the null model (assuming measurement invariance holds).
Applying tests to two-parameter IRT models with pairwise maximum likelihood estimation.

Main Results:

The proposed score-based tests demonstrate effectiveness in identifying problematic item parameters in simulations.
The study details the theoretical underpinnings and practical implementation of these novel tests.
An empirical example showcases the real-world application of the measurement invariance tests.

Conclusions:

Score-based tests offer a practical alternative for detecting measurement invariance violations in IRT.
These tests are valuable for ensuring scale validity and fairness across different groups.
The extension to two-parameter models broadens the applicability of these statistical tools.