Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Bias01:22

Bias

3.7K
Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...
3.7K
Comparing Experimental Results: Student's t-Test01:09

Comparing Experimental Results: Student's t-Test

1.4K
The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...
1.4K
Bias in Epidemiological Studies01:29

Bias in Epidemiological Studies

105
Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:  
105
Reliability and Validity01:29

Reliability and Validity

12.6K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
12.6K
Bonferroni Test01:10

Bonferroni Test

2.6K
The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...
2.6K
Weighted Mean00:57

Weighted Mean

4.9K
While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...
4.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Developing and validating a frailty score based on patient-reported outcome 3 months after stroke: A Riksstroke-based study.

PloS oneĀ·2026
Same author

The bit scale: A metric score scale for unidimensional item response theory models.

PsychometrikaĀ·2025
Same author

Combining Propensity Scores and Common Items for Test Score Equating.

Applied psychological measurementĀ·2025
Same author

An Information Manifold Perspective for Analyzing Test Data.

Applied psychological measurementĀ·2024
Same author

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests.

Applied psychological measurementĀ·2023
Same author

Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

Applied psychological measurementĀ·2023
Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurementĀ·2026
Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurementĀ·2026
Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurementĀ·2026
Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurementĀ·2026
Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurementĀ·2026
Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurementĀ·2026
See all related articles

Related Experiment Video

Updated: May 16, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

636

Calculating Bias in Test Score Equating in a NEAT Design.

Marie Wiberg1, Inga Laukaityte1

  • 1UmeĆ„ University, Sweden.

Applied Psychological Measurement
|March 31, 2025
PubMed
Summary
This summary is machine-generated.

Comparing test score equating methods reveals that the chosen criterion function significantly impacts bias assessment. Understanding these differences is crucial for accurate standardized test score comparability.

Keywords:
chained equatingcriterion functionfrequency estimation

More Related Videos

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

3.8K
Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

6.8K

Related Experiment Videos

Last Updated: May 16, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

636
Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities
10:26

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

3.8K
Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

6.8K

Area of Science:

  • Educational Measurement
  • Psychometrics
  • Statistical Analysis

Background:

  • Test score equating ensures comparability across different test forms, especially with non-equivalent groups.
  • The non-equivalent group with anchor test (NEAT) design is a common practical approach.
  • Evaluating bias in equating methods is critical for score interpretation.

Purpose of the Study:

  • To compare bias amounts under various conditions using chained equating versus frequency estimation.
  • To investigate the influence of five different criterion functions on bias calculations.
  • To assess how factors like group ability differences and test form characteristics affect equating bias.

Main Methods:

  • Utilized real data from a college admissions test and simulated data.
  • Employed chained equating and frequency estimation with identity, linear, equipercentile, chained, and frequency estimation criterion functions.
  • Examined bias under conditions of varying group ability, item difficulty, test form length, correlations, and sample sizes.

Main Results:

  • The choice of criterion function significantly influences the evaluation of bias in equating methods.
  • Empirical and simulated data demonstrated that different conditions highlight varying levels of bias.
  • Bias definition critically affects the preference for specific equating methods in different scenarios.

Conclusions:

  • The definition of bias is paramount when selecting equating methods for standardized tests.
  • Practical implications for standardized testing include careful consideration of criterion functions.
  • Recommendations are provided for calculating bias to evaluate equating transformations effectively.