Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Spearman's Rank Correlation Test01:20

Spearman's Rank Correlation Test

1.0K
Spearman's rank correlation test, also known as Spearman's rho, is a nonparametric method for assessing the strength and direction of association between two variables. This test is particularly valuable when the data distribution is unknown or when the assumption of normality does not hold. Named after the English psychologist and statistician Dr. Charles Edward Spearman, it serves as the nonparametric counterpart to Pearson's correlation coefficient.
Spearman's test calculates...
1.0K
Weighted Mean00:57

Weighted Mean

5.3K
While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...
5.3K
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

4.1K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
4.1K
Measures of Intelligence01:29

Measures of Intelligence

7.8K
Psychologists measure intelligence by using standardized tests that produce a score known as the intelligence quotient or IQ. To understand IQ tests, it's important to recognize the key principles behind their construction: validity, reliability, and standardization.
Validity refers to how well a test measures what it claims to measure. An intelligence test should accurately assess intelligence rather than another characteristic, like anxiety. Criterion validity is one way to evaluate this;...
7.8K
Multiple Comparison Tests01:13

Multiple Comparison Tests

4.0K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
4.0K
Test for Homogeneity01:23

Test for Homogeneity

2.1K
The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...
2.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Developing and validating a frailty score based on patient-reported outcome 3 months after stroke: A Riksstroke-based study.

PloS oneĀ·2026
Same author

The bit scale: A metric score scale for unidimensional item response theory models.

PsychometrikaĀ·2025
Same author

Calculating Bias in Test Score Equating in a NEAT Design.

Applied psychological measurementĀ·2025
Same author

An Information Manifold Perspective for Analyzing Test Data.

Applied psychological measurementĀ·2024
Same author

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests.

Applied psychological measurementĀ·2023
Same author

Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

Applied psychological measurementĀ·2023
Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurementĀ·2026
Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurementĀ·2026
Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurementĀ·2026
Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurementĀ·2026
Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurementĀ·2026
Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurementĀ·2026
See all related articles

Related Experiment Video

Updated: Sep 12, 2025

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

4.1K

Combining Propensity Scores and Common Items for Test Score Equating.

Inga Laukaityte1, Gabriel Wallin2, Marie Wiberg3

  • 1Department of Applied Educational Science, UmeĆ„ University, Sweden.

Applied Psychological Measurement
|August 4, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a new statistical method for fair test score comparisons. Combining propensity scores with common item data improves accuracy and reduces bias in educational testing.

Keywords:
academic admissioneducational testingequatingfairnessnonequivalent groups with anchor test design

More Related Videos

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing
15:00

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

763
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.6K

Related Experiment Videos

Last Updated: Sep 12, 2025

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

4.1K
A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing
15:00

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

763
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.6K

Area of Science:

  • Statistics
  • Educational Measurement
  • Psychometrics

Background:

  • Ensuring score comparability across test forms and groups is a key challenge in educational testing.
  • Current methods for test score equating often rely on common items or assumptions of group similarity.
  • Novel approaches are needed to enhance the fairness and accuracy of test score comparisons.

Purpose of the Study:

  • To develop and evaluate a novel statistical method for test score equating.
  • To combine propensity scores, based on background covariates, with common item information for improved score comparability.
  • To assess the performance of this integrated method using empirical and simulation studies.

Main Methods:

  • Utilized propensity scores derived from test taker background covariates.
  • Integrated propensity scores with common item information using kernel smoothing techniques.
  • Conducted an empirical analysis on a high-stakes college admissions test and a simulation study.

Main Results:

  • The proposed method, integrating propensity scores and common item data, demonstrated reduced standard errors and bias compared to using either source alone.
  • Balancing groups on test-taker covariates was shown to enhance the fairness and accuracy of score comparisons.
  • The study highlighted the benefits of utilizing all available data for improved score comparability.

Conclusions:

  • The novel approach effectively enhances test score comparability by integrating propensity scores and common item information.
  • This method offers a more robust and accurate way to ensure fairness in educational testing across diverse groups.
  • Considering all collected data, including background covariates, is crucial for improving the precision of test score equating.