Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Bias

Bias

Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...

Comparing Experimental Results: Student's t-Test

Comparing Experimental Results: Student's t-Test

The t-test is a statistical method used to compare the sample mean with a population mean or compare two means from two data sets. The test statistic is calculated from the standard deviation, mean, and number of measurements in the data set at a selected confidence interval and then compared to a table of critical values at this confidence level. If the test statistic is smaller than the critical value, the null hypothesis is accepted. In this case, we state that the difference between the...

Bias in Epidemiological Studies

Bias in Epidemiological Studies

Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Bonferroni Test

Bonferroni Test

The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Developing and validating a frailty score based on patient-reported outcome 3 months after stroke: A Riksstroke-based study.

PloS one·2026

Same author

The bit scale: A metric score scale for unidimensional item response theory models.

Psychometrika·2025

Same author

Combining Propensity Scores and Common Items for Test Score Equating.

Applied psychological measurement·2025

Same author

An Information Manifold Perspective for Analyzing Test Data.

Applied psychological measurement·2024

Same author

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests.

Applied psychological measurement·2023

Same author

Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

Applied psychological measurement·2023

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurement·2026

Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 16, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Calculating Bias in Test Score Equating in a NEAT Design.

Marie Wiberg¹, Inga Laukaityte¹

¹Umeå University, Sweden.

Applied Psychological Measurement

|March 31, 2025

Summary

This summary is machine-generated.

Comparing test score equating methods reveals that the chosen criterion function significantly impacts bias assessment. Understanding these differences is crucial for accurate standardized test score comparability.

Keywords:

chained equating criterion function frequency estimation

More Related Videos

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Related Experiment Videos

Last Updated: May 16, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Area of Science:

Educational Measurement
Psychometrics
Statistical Analysis

Background:

Test score equating ensures comparability across different test forms, especially with non-equivalent groups.
The non-equivalent group with anchor test (NEAT) design is a common practical approach.
Evaluating bias in equating methods is critical for score interpretation.

Purpose of the Study:

To compare bias amounts under various conditions using chained equating versus frequency estimation.
To investigate the influence of five different criterion functions on bias calculations.
To assess how factors like group ability differences and test form characteristics affect equating bias.

Main Methods:

Utilized real data from a college admissions test and simulated data.
Employed chained equating and frequency estimation with identity, linear, equipercentile, chained, and frequency estimation criterion functions.
Examined bias under conditions of varying group ability, item difficulty, test form length, correlations, and sample sizes.

Main Results:

The choice of criterion function significantly influences the evaluation of bias in equating methods.
Empirical and simulated data demonstrated that different conditions highlight varying levels of bias.
Bias definition critically affects the preference for specific equating methods in different scenarios.

Conclusions:

The definition of bias is paramount when selecting equating methods for standardized tests.
Practical implications for standardized testing include careful consideration of criterion functions.
Recommendations are provided for calculating bias to evaluate equating transformations effectively.