Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Accuracy and Errors in Hypothesis Testing

Accuracy and Errors in Hypothesis Testing

Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Uncertainty in Measurement: Accuracy and Precision

Uncertainty in Measurement: Accuracy and Precision

Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value.

Wilcoxon Rank-Sum Test

Wilcoxon Rank-Sum Test

The Wilcoxon rank-sum test, also known as the Mann-Whitney U test, is a nonparametric test used to determine if there is a significant difference between the distributions of two independent samples. This test is designed specifically for two independent populations and has the following key requirements:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same authorSame journal

When Do Unifactorial Items Increase the Reliability?

Psychometrika·2026

Same author

Bias and precision in true-score estimation.

The British journal of mathematical and statistical psychology·2026

Same author

Recognize the Value of the Sum Score, Psychometrics' Greatest Accomplishment.

Psychometrika·2026

Same author

Proof of Reliability Convergence to 1 at Rate of Spearman-Brown Formula for Random Test Forms and Irrespective of Item Pool Dimensionality.

Psychometrika·2026

Same author

Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task.

Psychometrika·2026

Same author

Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task.

Psychometrika·2024

Same journal

Testing linear hypotheses in repeated measures generalized linear models using external information.

Psychometrika·2026

Same journal

Longitudinal Designs for Diagnostic Models: Identification and Estimation.

Psychometrika·2026

Same journal

Modeling Rare Events and Nonmonotone Nonignorable Missingness of Time-Varying Outcomes and Predictors in Binary Time-Series Daily Diary Data: A Bayesian Selection Model.

Psychometrika·2026

Same journal

Revelle's Beta: The Wait Is Over-Computation Becomes Possible.

Psychometrika·2026

Same journal

On dimensional implication graphs.

Psychometrika·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 21, 2025

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow

Published on: December 15, 2023

A Test Can Have Multiple Reliabilities.

Jules L Ellis¹

¹Behavioural Science Institute, Radboud University Nijmegen, P.O.B. 9104, 6500 HE,, Nijmegen, The Netherlands. jules.ellis@ru.nl.

|September 9, 2021

Summary

This summary is machine-generated.

The generalizability theory interpretation of coefficient alpha is preferred for estimating reliability. This approach offers a more empirically supported measure compared to traditional methods, especially in complex designs.

Keywords:

domain sampling generalizability indeterminacy latent variable reliability stochastic subject true score

More Related Videos

Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity

Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity

Published on: June 12, 2019

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

Related Experiment Videos

Last Updated: Oct 21, 2025

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow

Published on: December 15, 2023

Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity

Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity

Published on: June 12, 2019

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

Area of Science:

Psychometrics
Educational Measurement
Psychological Statistics

Background:

Coefficient alpha is a widely used measure of internal consistency in psychometric research.
Existing interpretations of coefficient alpha, such as Lord and Novick's true score theory, face empirical limitations.
Generalizability theory offers an alternative framework for interpreting coefficient alpha.

Purpose of the Study:

To advocate for the generalizability theory interpretation of coefficient alpha.
To compare the empirical basis of generalizability theory's true scores with alternative models.
To delineate conditions under which the generalizability interpretation is most appropriate.

Main Methods:

Conceptual analysis comparing different theoretical frameworks for coefficient alpha.
Examination of the assumptions underlying domain sampling versus stochastic subject models.
Discussion of latent variable models and their implications for reliability estimation.

Main Results:

Coefficient alpha is presented as a consistent, albeit slightly biased, estimate of the generalizability coefficient in a random subjects x items design.
The domain sampling true scores used in generalizability theory are argued to have a stronger empirical foundation.
The generalizability interpretation is favored over alternative models unless a latent variable model with proven validity (e.g., McDonald's omega) is applicable.

Conclusions:

The generalizability theory interpretation of coefficient alpha provides a more robust and empirically grounded approach to reliability estimation.
This interpretation is particularly valuable in designs involving random sampling of both subjects and items.
Alternative interpretations are conditionally defensible when specific latent variable models are met, especially those implying essential tau-equivalence.