Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures from...

Self-Report Tests of Personality

Self-Report Tests of Personality

Self-report inventories are objective personality assessments that use multiple-choice items or numbered scales, typically ranging from 1 (strongly disagree) to 5 (strongly agree). They are often called Likert scales after Rensis Likert. These inventories are widely used due to their ease of administration and cost-effectiveness. One of the most prominent examples is the Minnesota Multiphasic Personality Inventory (MMPI), initially developed in the 1940s to assess abnormal personality traits.

Identifying Statistically Significant Differences: The F-Test

Identifying Statistically Significant Differences: The F-Test

The F-test is used to compare two sample variances to each other or compare the sample variance to the population variance. It is used to decide whether an indeterminate error can explain the difference in their values. The underlying assumptions that allow the use of the F-test include the data set or sets are normally distributed, and the data sets are independent of each other. The test statistic F is calculated by dividing one variance by another. In other words, the square of one standard...

Factorial Design

Factorial Design

Factorial Analysis is an experimental design that applies Analysis of Variance (ANOVA) statistical procedures to examine a change in a dependent variable due to more than one independent variable, also known as factors. Changes in worker productivity can be reasoned, for example, to be influenced by salary and other conditions, such as skill level. One way to test this hypothesis is by categorizing salary into three levels (low, moderate, and high) and skills sets into two levels (entry level...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can be stated as...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Amplifications of EVX2 and HOXD9-HOXD13 on 2q31 in mature cystic teratomas of the ovary identified by array comparative genomic hybridization may explain teratoma characteristics in chondrogenesis and osteogenesis.

Journal of ovarian research·2024

Same author

Assessing ChatGPT's capacity for clinical decision support in pediatrics: A comparative study with pediatricians using KIDMAP of Rasch analysis.

Medicine·2023

Same author

An Iterative Scale Purification Procedure on <i>l</i><sub>z</sub> for the Detection of Aberrant Responses.

Multivariate behavioral research·2023

Same author

DUSP5 and PHLDA1 mutations in mature cystic teratomas of the ovary identified on whole-exome sequencing may explain teratoma characteristics.

Human genomics·2022

Same author

Corrigendum: Methylation Statuses of H19DMR and KvDMR at WT2 in Wilms Tumors in Taiwan.

Pathology oncology research : POR·2022

Same author

Computerized Adaptive Testing for Ipsative Tests with Multidimensional Pairwise-Comparison Items: Algorithm Development and Applications.

Applied psychological measurement·2022

Same journal

Development of a Short Form of the CPAI-A (Form B) with Rasch Analyses.

Journal of applied measurement·2021

Same journal

Evaluating the Impact of Multidimensionality on Type I and Type II Error Rates using the Q-Index Item Fit Statistic for the Rasch Model.

Journal of applied measurement·2021

Same journal

Diabetes Distress in Emerging Adults: Refining the Problem Areas in Diabetes-Emerging Adult Version using Rasch Analysis.

Journal of applied measurement·2021

Same journal

A Psychometric Replication of Fan (1998) Item Response Theory and Classical Test Theory: An Empirical Comparison of their Item/Person Statistics.

Journal of applied measurement·2021

Same journal

The Development of the Mental Toughness Situational Judgment Test: A Novel Approach to Assessing Mental Toughness.

Journal of applied measurement·2021

Same journal

Using the Rasch Model to Measure Comprehension of Fraction Addition.

Journal of applied measurement·2021

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 27, 2026

Advancing Dyslexia Assessment in Children Through Computerized Testing

Advancing Dyslexia Assessment in Children Through Computerized Testing

Published on: August 16, 2024

Assessment of differential item functioning.

Wen-Chung Wang¹

¹Hong Kong Institute of Education, Department of Educational Psychology, Counseling and Learning Needs, 10 Lo Ping Road, Tai Po, New Territories, Hong Kong. wcwang@ied.edu.hk

Journal of Applied Measurement

|December 19, 2008

Summary

This summary is machine-generated.

This study enhances differential item functioning (DIF) assessment by detailing methods for establishing a common metric, recommending the constant-item (CI) method for superior accuracy in detecting DIF across various item types.

More Related Videos

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Measuring the Functional Abilities of Children Aged 3-6 Years Old with Observational Methods and Computer Tools

Measuring the Functional Abilities of Children Aged 3-6 Years Old with Observational Methods and Computer Tools

Published on: June 20, 2020

Related Experiment Videos

Last Updated: Jun 27, 2026

Advancing Dyslexia Assessment in Children Through Computerized Testing

Advancing Dyslexia Assessment in Children Through Computerized Testing

Published on: August 16, 2024

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Measuring the Functional Abilities of Children Aged 3-6 Years Old with Observational Methods and Computer Tools

Measuring the Functional Abilities of Children Aged 3-6 Years Old with Observational Methods and Computer Tools

Published on: June 20, 2020

Area of Science:

Psychometrics
Educational Measurement
Statistical Modeling

Background:

Differential Item Functioning (DIF) is crucial for unbiased assessment.
Accurate DIF detection requires a common metric across test-taker groups.
Existing methods for establishing common metrics have limitations.

Purpose of the Study:

To review and evaluate methods for establishing a common metric in DIF assessment.
To introduce and demonstrate the effectiveness of the constant-item (CI) method.
To discuss the practical significance of DIF at item and test levels.

Main Methods:

Review of three common metric methods: equal-mean-difficulty, all-other-item, and constant-item (CI).
Simulation studies comparing the CI method with other approaches.
Development and illustration of a method for identifying DIF-free anchor items.

Main Results:

The constant-item (CI) method demonstrated superiority over alternative common metric approaches in simulations.
The proposed method for identifying DIF-free anchor items proved effective.
The study provides a framework for assessing the practical significance of DIF.

Conclusions:

The constant-item (CI) method is recommended for robust DIF assessment.
Effective identification of anchor items is key to the CI method's success.
Assessing practical significance is essential for interpreting DIF findings.