Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

3.2K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
3.2K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.7K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.7K
Identifying Statistically Significant Differences: The F-Test01:14

Identifying Statistically Significant Differences: The F-Test

1.6K
The F-test is used to compare two sample variances to each other or compare the sample variance to the population variance. It is used to decide whether an indeterminate error can explain the difference in their values. The underlying assumptions that allow the use of the F-test include the data set or sets are normally distributed, and the data sets are independent of each other. The test statistic F is calculated by dividing one variance by another. In other words, the square of one standard...
1.6K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.5K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.5K
Testing a Claim about Standard Deviation01:19

Testing a Claim about Standard Deviation

2.4K
A complete procedure to test a claim about population standard deviation or population variance is explained here.
The hypothesis testing for the claim of population standard deviation (or variance) requires the data and samples to be random and unbiased. The population distribution also must be normal. There is no specific requirement on the sample size as the estimation is based on the chi-square distribution.
As a first step, the hypothesis (null and alternative) concerning the claim about...
2.4K
Comparing Experimental Results: Student's t-Test01:09

Comparing Experimental Results: Student's t-Test

1.5K
The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...
1.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

AgrAbility Quality of Life Profile Transitions and Relationships with Independent Living and Working.

Journal of agromedicine·2026
Same author

The Illness Management and Recovery Scale: Adaptation and Validation Study of the Spanish Version.

Evaluation & the health professions·2026
Same author

The Dispositional Hope Scale in Spanish-speaking users of mental health services: validation and normative data.

BMC psychology·2026
Same author

Psychometric evaluation of California Verbal Learning Test second edition short form (CVLT-II SF) score validity in American Indian adults: The Strong Heart Study.

Neuropsychology·2025
Same author

Psychometric properties of the NIH Toolbox Cognition Battery composites in older adults at risk for Alzheimer's disease and related dementias: A systematic review.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025
Same author

Exploring AgrAbility Quality of Life Profiles.

Journal of agromedicine·2025
Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026
Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026
Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026
Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026
Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026
Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026
See all related articles

Related Experiment Video

Updated: Jun 6, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

700

Differential Item Functioning Effect Size Use for Validity Information.

W Holmes Finch1, Maria Dolores Hidalgo Montesinos2, Brian F French3

  • 1Ball State University, Muncie, IN, USA.

Educational and Psychological Measurement
|November 25, 2024
PubMed
Summary
This summary is machine-generated.

Effect sizes help quantify differential item functioning (DIF) magnitude. The log odds ratio and Mantel-Haenszel log odds ratio variance accurately identified which assessment had more DIF in a simulation study.

Keywords:
differential item functioningeffect sizevalidity

More Related Videos

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

3.9K
A Two-interval Forced-choice Task for Multisensory Comparisons
07:13

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

10.9K

Related Experiment Videos

Last Updated: Jun 6, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

700
Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

3.9K
A Two-interval Forced-choice Task for Multisensory Comparisons
07:13

A Two-interval Forced-choice Task for Multisensory Comparisons

Published on: November 9, 2018

10.9K

Area of Science:

  • Psychometrics
  • Educational Measurement
  • Statistical Analysis

Background:

  • Statistical significance testing for differential item functioning (DIF) lacks magnitude interpretation.
  • Effect sizes are crucial for understanding the practical significance of detected DIF.
  • Various effect size measures and interpretation guidelines exist for DIF analysis.

Purpose of the Study:

  • To compare the performance of DIF effect size measures in quantifying and comparing DIF across two assessments.
  • To evaluate if effect sizes accurately capture aggregate DIF and identify assessments with less DIF.
  • To identify robust DIF effect size measures under various simulated conditions.

Main Methods:

  • A simulation study was conducted manipulating factors influencing effect sizes and DIF detection.
  • Performance of different DIF effect size measures was compared.
  • Effect sizes were applied to a real dataset for practical illustration.

Main Results:

  • The log odds ratio of fixed effects (Ln ) and the variance of the Mantel-Haenszel log odds ratio ( ) demonstrated high accuracy.
  • These measures effectively identified which assessment exhibited a greater amount of DIF.
  • Several effect sizes showed reliable performance across diverse simulated scenarios.

Conclusions:

  • The log odds ratio and Mantel-Haenszel log odds ratio variance are recommended for quantifying DIF magnitude and comparing DIF levels between assessments.
  • These effect sizes provide valuable insights beyond statistical significance in DIF analysis.
  • Further research should focus on effect sizes to enhance understanding of DIF magnitude.