Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Identifying Statistically Significant Differences: The F-Test

Identifying Statistically Significant Differences: The F-Test

The F-test is used to compare two sample variances to each other or compare the sample variance to the population variance. It is used to decide whether an indeterminate error can explain the difference in their values. The underlying assumptions that allow the use of the F-test include the data set or sets are normally distributed, and the data sets are independent of each other. The test statistic F is calculated by dividing one variance by another. In other words, the square of one standard...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Testing a Claim about Standard Deviation

Testing a Claim about Standard Deviation

A complete procedure to test a claim about population standard deviation or population variance is explained here.
The hypothesis testing for the claim of population standard deviation (or variance) requires the data and samples to be random and unbiased. The population distribution also must be normal. There is no specific requirement on the sample size as the estimation is based on the chi-square distribution.
As a first step, the hypothesis (null and alternative) concerning the claim about...

Comparing Experimental Results: Student's t-Test

Comparing Experimental Results: Student's t-Test

The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

AgrAbility Quality of Life Profile Transitions and Relationships with Independent Living and Working.

Journal of agromedicine·2026

Same author

The Illness Management and Recovery Scale: Adaptation and Validation Study of the Spanish Version.

Evaluation & the health professions·2026

Same author

The Dispositional Hope Scale in Spanish-speaking users of mental health services: validation and normative data.

BMC psychology·2026

Same author

Psychometric evaluation of California Verbal Learning Test second edition short form (CVLT-II SF) score validity in American Indian adults: The Strong Heart Study.

Neuropsychology·2025

Same author

Psychometric properties of the NIH Toolbox Cognition Battery composites in older adults at risk for Alzheimer's disease and related dementias: A systematic review.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025

Same author

Exploring AgrAbility Quality of Life Profiles.

Journal of agromedicine·2025

Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026

Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026

Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026

Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026

Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026

Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 6, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Differential Item Functioning Effect Size Use for Validity Information.

W Holmes Finch¹, Maria Dolores Hidalgo Montesinos², Brian F French³

¹Ball State University, Muncie, IN, USA.

Educational and Psychological Measurement

|November 25, 2024

Summary

This summary is machine-generated.

Effect sizes help quantify differential item functioning (DIF) magnitude. The log odds ratio and Mantel-Haenszel log odds ratio variance accurately identified which assessment had more DIF in a simulation study.

Keywords:

differential item functioning effect size validity

More Related Videos

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

A Two-interval Forced-choice Task for Multisensory Comparisons

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

Related Experiment Videos

Last Updated: Jun 6, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

A Two-interval Forced-choice Task for Multisensory Comparisons

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

Area of Science:

Psychometrics
Educational Measurement
Statistical Analysis

Background:

Statistical significance testing for differential item functioning (DIF) lacks magnitude interpretation.
Effect sizes are crucial for understanding the practical significance of detected DIF.
Various effect size measures and interpretation guidelines exist for DIF analysis.

Purpose of the Study:

To compare the performance of DIF effect size measures in quantifying and comparing DIF across two assessments.
To evaluate if effect sizes accurately capture aggregate DIF and identify assessments with less DIF.
To identify robust DIF effect size measures under various simulated conditions.

Main Methods:

A simulation study was conducted manipulating factors influencing effect sizes and DIF detection.
Performance of different DIF effect size measures was compared.
Effect sizes were applied to a real dataset for practical illustration.

Main Results:

The log odds ratio of fixed effects (Ln ) and the variance of the Mantel-Haenszel log odds ratio ( ) demonstrated high accuracy.
These measures effectively identified which assessment exhibited a greater amount of DIF.
Several effect sizes showed reliable performance across diverse simulated scenarios.

Conclusions:

The log odds ratio and Mantel-Haenszel log odds ratio variance are recommended for quantifying DIF magnitude and comparing DIF levels between assessments.
These effect sizes provide valuable insights beyond statistical significance in DIF analysis.
Further research should focus on effect sizes to enhance understanding of DIF magnitude.