Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

515
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
515
Bonferroni Test01:10

Bonferroni Test

3.4K
The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...
3.4K
Multiple Comparison Tests01:13

Multiple Comparison Tests

4.5K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
4.5K
Fisher's Exact Test01:08

Fisher's Exact Test

1.3K
Fisher's exact test is a statistical significance test widely used to analyze 2x2 contingency tables, particularly in situations where sample sizes are small. Unlike the chi-squared test, which approximates P-values and assumes minimum expected frequencies of at least five in each cell, Fisher's exact test calculates the exact probability (P-value) of observing the data or more extreme results under the null hypothesis. This feature makes it especially valuable when the assumptions of...
1.3K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.7K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.7K
Self-Report Tests of Personality01:22

Self-Report Tests of Personality

891
Self-report inventories are objective personality assessments that use multiple-choice items or numbered scales, typically ranging from 1 (strongly disagree) to 5 (strongly agree). They are often called Likert scales after Rensis Likert. These inventories are widely used due to their ease of administration and cost-effectiveness. One of the most prominent examples is the Minnesota Multiphasic Personality Inventory (MMPI), initially developed in the 1940s to assess abnormal personality traits.
891

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Bayesian evaluation for latent variable models: A tutorial on computing information criteria and bayes factors with the r package bleval.

Psychological methods·2026
Same author

Predictive value of real-time memory tests in identifying alcohol-induced blackouts in situ.

Addiction (Abingdon, England)·2026
Same author

Bayesian Estimation of Normal and Probit Psychometric Models.

Psychometrika·2026
Same author

Identification and Scaling of Latent Variables in Ordinal Factor Analysis.

Psychometrika·2026
Same author

Spaced Repetition Enhances Self-Rated Learning Confidence: A Large Randomized Trial Among Practicing Family Physicians.

The Journal of continuing education in the health professions·2025
Same author

Physical context of alcohol use and craving: An EMA exploratory study.

Addictive behaviors·2025
Same journal

BAYESIAN MIXED MULTIDIMENSIONAL SCALING FOR AUDITORY PROCESSING.

Psychometrika·2026
Same journal

Testing linear hypotheses in repeated measures generalized linear models using external information.

Psychometrika·2026
Same journal

When Do Unifactorial Items Increase the Reliability?

Psychometrika·2026
Same journal

Longitudinal Designs for Diagnostic Models: Identification and Estimation.

Psychometrika·2026
Same journal

Modeling Rare Events and Nonmonotone Nonignorable Missingness of Time-Varying Outcomes and Predictors in Binary Time-Series Daily Diary Data: A Bayesian Selection Model.

Psychometrika·2026
Same journal

Revelle's Beta: The Wait Is Over-Computation Becomes Possible.

Psychometrika·2026
See all related articles

Related Experiment Video

Updated: Feb 18, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K

Score-Based Tests of Differential Item Functioning via Pairwise Maximum Likelihood Estimation.

Ting Wang1, Carolin Strobl2, Achim Zeileis3

  • 1Department of Psychological Sciences, University of Missouri, Columbia, MO, USA. twb8d@mail.missouri.edu.

Psychometrika
|November 19, 2017
PubMed
Summary
This summary is machine-generated.

This study introduces novel score-based tests to detect violations of measurement invariance in item response theory models. These tests effectively identify problematic parameters without needing prior information, improving scale interpretation.

Keywords:
differential item functioningitem response theorypairwise maximum likelihoodscore-based test

More Related Videos

Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K
Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA
10:58

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

5.0K

Related Experiment Videos

Last Updated: Feb 18, 2026

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.3K
Computerized Adaptive Testing System of Functional Assessment of Stroke
05:21

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

6.3K
Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA
10:58

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

5.0K

Area of Science:

  • Psychometrics
  • Statistical modeling
  • Educational measurement

Background:

  • Measurement invariance is crucial in item response theory (IRT) for accurate latent construct assessment.
  • Violations can lead to misinterpretations and systematic bias, particularly in diverse populations.
  • Existing detection methods often require unavailable prior information on item parameters and groups.

Purpose of the Study:

  • To extend recently developed score-based tests for detecting measurement invariance violations.
  • To adapt these tests for two-parameter item response models, focusing on pairwise maximum likelihood.
  • To evaluate the tests' efficacy in identifying problematic item parameters.

Main Methods:

  • Utilizing score-based tests derived from casewise derivatives of the likelihood function.
  • Estimating only the null model (assuming measurement invariance holds).
  • Applying tests to two-parameter IRT models with pairwise maximum likelihood estimation.

Main Results:

  • The proposed score-based tests demonstrate effectiveness in identifying problematic item parameters in simulations.
  • The study details the theoretical underpinnings and practical implementation of these novel tests.
  • An empirical example showcases the real-world application of the measurement invariance tests.

Conclusions:

  • Score-based tests offer a practical alternative for detecting measurement invariance violations in IRT.
  • These tests are valuable for ensuring scale validity and fairness across different groups.
  • The extension to two-parameter models broadens the applicability of these statistical tools.