Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Wilcoxon Rank-Sum Test

Wilcoxon Rank-Sum Test

The Wilcoxon rank-sum test, also known as the Mann-Whitney U test, is a nonparametric test used to determine if there is a significant difference between the distributions of two independent samples. This test is designed specifically for two independent populations and has the following key requirements:

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Ordinal Level of Measurement

Ordinal Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
Data measured using an ordinal scale are similar to nominal scale data, but there is one major difference. The ordinal scale data can be ordered. An example of ordinal scale data is a list of the top five national parks...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Unveiling Undergraduate Research: Employing Ecological Momentary Assessment to Characterize and Compare Undergraduate Research Experiences.

CBE life sciences education·2025

Same author

On the Use of Elbow Plot Method for Class Enumeration in Factor Mixture Models.

Applied psychological measurement·2025

Same author

An Evaluation of Fit Indices Used in Model Selection of Dichotomous Mixture IRT Models.

Educational and psychological measurement·2024

Same author

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.

Applied psychological measurement·2023

Same author

Exploring examinees' responses to constructed response items with a supervised topic model.

The British journal of mathematical and statistical psychology·2023

Same author

The Impact of Sample Size and Various Other Factors on Estimation of Dichotomous Mixture IRT Models.

Educational and psychological measurement·2023

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurement·2026

Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 28, 2025

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

Reliability for Tests With Items Having Different Numbers of Ordered Categories.

Seohyun Kim¹, Zhenqiu Lu¹, Allan S Cohen¹

¹University of Georgia, Athens, USA.

Applied Psychological Measurement

|February 21, 2020

Summary

This summary is machine-generated.

A new structural equation modeling (SEM) approach enhances reliability analysis for tests with varied ordered categories. This method proves accurate, closely matching population reliability across diverse conditions.

Keywords:

categorical data reliability structural equation modeling

More Related Videos

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Related Experiment Videos

Last Updated: Dec 28, 2025

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Computerized Adaptive Testing System of Functional Assessment of Stroke

Computerized Adaptive Testing System of Functional Assessment of Stroke

Published on: January 7, 2019

Area of Science:

Psychometrics
Statistical Modeling
Educational Measurement

Background:

Traditional reliability coefficients like coefficient alpha have limitations with items having varying numbers of ordered categories.
Assessing the reliability of tests with mixed-category items requires advanced statistical approaches.

Purpose of the Study:

To introduce and evaluate a novel structural equation modeling (SEM) approach for estimating reliability in tests with items having different numbers of ordered categories.
To compare the performance of the proposed SEM reliability coefficient against coefficient alpha and population reliability.

Main Methods:

Structural Equation Modeling (SEM) was employed to develop a new reliability coefficient.
A simulation study was conducted to compare reliability coefficients under various conditions, including different numbers of ordered categories, one-factor and bifactor structures, and score skewness.
An empirical example using a test with dichotomous and trichotomous items was analyzed.

Main Results:

The proposed SEM reliability coefficient demonstrated strong performance, closely approximating population reliability across most simulated conditions.
The simulation results provided insights into the behavior of different reliability coefficients under varying psychometric properties.
The empirical example highlighted the practical application and performance differences of the coefficients.

Conclusions:

The proposed SEM-based reliability approach is a viable and accurate method for tests with items having different numbers of ordered categories.
This study offers a valuable tool for researchers and practitioners needing to assess reliability in complex measurement instruments.
The findings underscore the importance of using appropriate reliability estimation methods tailored to item characteristics.