Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Spearman's Rank Correlation Test

Spearman's Rank Correlation Test

Spearman's rank correlation test, also known as Spearman's rho, is a nonparametric method for assessing the strength and direction of association between two variables. This test is particularly valuable when the data distribution is unknown or when the assumption of normality does not hold. Named after the English psychologist and statistician Dr. Charles Edward Spearman, it serves as the nonparametric counterpart to Pearson's correlation coefficient.
Spearman's test calculates...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Measures of Intelligence

Measures of Intelligence

Psychologists measure intelligence by using standardized tests that produce a score known as the intelligence quotient or IQ. To understand IQ tests, it's important to recognize the key principles behind their construction: validity, reliability, and standardization.
Validity refers to how well a test measures what it claims to measure. An intelligence test should accurately assess intelligence rather than another characteristic, like anxiety. Criterion validity is one way to evaluate this;...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Developing and validating a frailty score based on patient-reported outcome 3 months after stroke: A Riksstroke-based study.

PloS one·2026

Same author

The bit scale: A metric score scale for unidimensional item response theory models.

Psychometrika·2025

Same author

Calculating Bias in Test Score Equating in a NEAT Design.

Applied psychological measurement·2025

Same author

An Information Manifold Perspective for Analyzing Test Data.

Applied psychological measurement·2024

Same author

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests.

Applied psychological measurement·2023

Same author

Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

Applied psychological measurement·2023

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurement·2026

Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 12, 2025

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

Combining Propensity Scores and Common Items for Test Score Equating.

Inga Laukaityte¹, Gabriel Wallin², Marie Wiberg³

¹Department of Applied Educational Science, Umeå University, Sweden.

Applied Psychological Measurement

|August 4, 2025

Summary

This summary is machine-generated.

This study introduces a new statistical method for fair test score comparisons. Combining propensity scores with common item data improves accuracy and reduces bias in educational testing.

Keywords:

academic admission educational testing equating fairness nonequivalent groups with anchor test design

More Related Videos

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Related Experiment Videos

Last Updated: Sep 12, 2025

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Area of Science:

Statistics
Educational Measurement
Psychometrics

Background:

Ensuring score comparability across test forms and groups is a key challenge in educational testing.
Current methods for test score equating often rely on common items or assumptions of group similarity.
Novel approaches are needed to enhance the fairness and accuracy of test score comparisons.

Purpose of the Study:

To develop and evaluate a novel statistical method for test score equating.
To combine propensity scores, based on background covariates, with common item information for improved score comparability.
To assess the performance of this integrated method using empirical and simulation studies.

Main Methods:

Utilized propensity scores derived from test taker background covariates.
Integrated propensity scores with common item information using kernel smoothing techniques.
Conducted an empirical analysis on a high-stakes college admissions test and a simulation study.

Main Results:

The proposed method, integrating propensity scores and common item data, demonstrated reduced standard errors and bias compared to using either source alone.
Balancing groups on test-taker covariates was shown to enhance the fairness and accuracy of score comparisons.
The study highlighted the benefits of utilizing all available data for improved score comparability.

Conclusions:

The novel approach effectively enhances test score comparability by integrating propensity scores and common item information.
This method offers a more robust and accurate way to ensure fairness in educational testing across diverse groups.
Considering all collected data, including background covariates, is crucial for improving the precision of test score equating.