Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Reliability and Validity01:29

Reliability and Validity

13.3K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
13.3K
Multiple Comparison Tests01:13

Multiple Comparison Tests

4.1K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
4.1K
Accuracy and Errors in Hypothesis Testing01:13

Accuracy and Errors in Hypothesis Testing

381
Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...
381
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

5.7K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
5.7K
Uncertainty in Measurement: Accuracy and Precision03:37

Uncertainty in Measurement: Accuracy and Precision

97.2K
Scientists typically make repeated measurements of a quantity to ensure the quality of their findings and to evaluate both the precision and the accuracy of their results. Measurements are said to be precise if they yield very similar results when repeated in the same manner. A measurement is considered accurate if it yields a result that is very close to the true or the accepted value. Precise values agree with each other; accurate values agree with a true value. 
97.2K
Wilcoxon Rank-Sum Test01:21

Wilcoxon Rank-Sum Test

389
The Wilcoxon rank-sum test, also known as the Mann-Whitney U test, is a nonparametric test used to determine if there is a significant difference between the distributions of two independent samples. This test is designed specifically for two independent populations and has the following key requirements:
389

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same authorSame journal

When Do Unifactorial Items Increase the Reliability?

Psychometrika·2026
Same author

Bias and precision in true-score estimation.

The British journal of mathematical and statistical psychology·2026
Same author

Recognize the Value of the Sum Score, Psychometrics' Greatest Accomplishment.

Psychometrika·2026
Same author

Proof of Reliability Convergence to 1 at Rate of Spearman-Brown Formula for Random Test Forms and Irrespective of Item Pool Dimensionality.

Psychometrika·2026
Same author

Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task.

Psychometrika·2026
Same author

Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task.

Psychometrika·2024
Same journal

Testing linear hypotheses in repeated measures generalized linear models using external information.

Psychometrika·2026
Same journal

Longitudinal Designs for Diagnostic Models: Identification and Estimation.

Psychometrika·2026
Same journal

Modeling Rare Events and Nonmonotone Nonignorable Missingness of Time-Varying Outcomes and Predictors in Binary Time-Series Daily Diary Data: A Bayesian Selection Model.

Psychometrika·2026
Same journal

Revelle's Beta: The Wait Is Over-Computation Becomes Possible.

Psychometrika·2026
Same journal

On dimensional implication graphs.

Psychometrika·2026
See all related articles

Related Experiment Video

Updated: Oct 21, 2025

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow
09:18

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow

Published on: December 15, 2023

3.1K

A Test Can Have Multiple Reliabilities.

Jules L Ellis1

  • 1Behavioural Science Institute, Radboud University Nijmegen, P.O.B. 9104, 6500 HE,, Nijmegen, The Netherlands. jules.ellis@ru.nl.

Psychometrika
|September 9, 2021
PubMed
Summary
This summary is machine-generated.

The generalizability theory interpretation of coefficient alpha is preferred for estimating reliability. This approach offers a more empirically supported measure compared to traditional methods, especially in complex designs.

Keywords:
domain samplinggeneralizabilityindeterminacylatent variablereliabilitystochastic subjecttrue score

More Related Videos

Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity
08:40

Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity

Published on: June 12, 2019

7.6K
A Protocol of Manual Tests to Measure Sensation and Pain in Humans
07:28

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

21.2K

Related Experiment Videos

Last Updated: Oct 21, 2025

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow
09:18

Author Spotlight: Assessing the Reliability of Doppler Ultrasound in Measuring Leg Blood Flow

Published on: December 15, 2023

3.1K
Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity
08:40

Isokinetic Robotic Device to Improve Test-Retest and Inter-Rater Reliability for Stretch Reflex Measurements in Stroke Patients with Spasticity

Published on: June 12, 2019

7.6K
A Protocol of Manual Tests to Measure Sensation and Pain in Humans
07:28

A Protocol of Manual Tests to Measure Sensation and Pain in Humans

Published on: December 19, 2016

21.2K

Area of Science:

  • Psychometrics
  • Educational Measurement
  • Psychological Statistics

Background:

  • Coefficient alpha is a widely used measure of internal consistency in psychometric research.
  • Existing interpretations of coefficient alpha, such as Lord and Novick's true score theory, face empirical limitations.
  • Generalizability theory offers an alternative framework for interpreting coefficient alpha.

Purpose of the Study:

  • To advocate for the generalizability theory interpretation of coefficient alpha.
  • To compare the empirical basis of generalizability theory's true scores with alternative models.
  • To delineate conditions under which the generalizability interpretation is most appropriate.

Main Methods:

  • Conceptual analysis comparing different theoretical frameworks for coefficient alpha.
  • Examination of the assumptions underlying domain sampling versus stochastic subject models.
  • Discussion of latent variable models and their implications for reliability estimation.

Main Results:

  • Coefficient alpha is presented as a consistent, albeit slightly biased, estimate of the generalizability coefficient in a random subjects x items design.
  • The domain sampling true scores used in generalizability theory are argued to have a stronger empirical foundation.
  • The generalizability interpretation is favored over alternative models unless a latent variable model with proven validity (e.g., McDonald's omega) is applicable.

Conclusions:

  • The generalizability theory interpretation of coefficient alpha provides a more robust and empirically grounded approach to reliability estimation.
  • This interpretation is particularly valuable in designs involving random sampling of both subjects and items.
  • Alternative interpretations are conditionally defensible when specific latent variable models are met, especially those implying essential tau-equivalence.