Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Ratio Level of Measurement

Ratio Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
A set of data measured using the ratio scale takes care of the ratio problem and provides complete information. Ratio scale data are like interval scale data, except they have a zero point and ratios can be calculated....

Wilcoxon Rank-Sum Test

Wilcoxon Rank-Sum Test

The Wilcoxon rank-sum test, also known as the Mann-Whitney U test, is a nonparametric test used to determine if there is a significant difference between the distributions of two independent samples. This test is designed specifically for two independent populations and has the following key requirements:

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Testing a Claim about Standard Deviation

Testing a Claim about Standard Deviation

A complete procedure to test a claim about population standard deviation or population variance is explained here.
The hypothesis testing for the claim of population standard deviation (or variance) requires the data and samples to be random and unbiased. The population distribution also must be normal. There is no specific requirement on the sample size as the estimation is based on the chi-square distribution.
As a first step, the hypothesis (null and alternative) concerning the claim about...

Measures of Intelligence

Measures of Intelligence

Psychologists measure intelligence by using standardized tests that produce a score known as the intelligence quotient or IQ. To understand IQ tests, it's important to recognize the key principles behind their construction: validity, reliability, and standardization.
Validity refers to how well a test measures what it claims to measure. An intelligence test should accurately assess intelligence rather than another characteristic, like anxiety. Criterion validity is one way to evaluate this;...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Using Deep Learning to Choose Optimal Smoothing Values for Equating.

Applied psychological measurement·2025

Same author

A Seed Usage Issue on Using catR for Simulation and the Solution.

Applied psychological measurement·2020

Same author

On a New Algorithm for Removing Repeating Patterns in Similarity Analysis.

Educational and psychological measurement·2020

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurement·2026

Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 18, 2025

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Evaluating Robust Scale Transformation Methods With Multiple Outlying Common Items Under IRT True Score Equating.

Yong He¹, Zhongmin Cui¹

¹ACT, Inc., Iowa City, IA, USA.

Applied Psychological Measurement

|June 16, 2020

Summary

This summary is machine-generated.

This study shows robust scale transformation methods effectively handle multiple outlier common items in test equating. These methods reduce outlier impact while maintaining content balance, improving test accuracy.

Keywords:

equating item response theory multiple outliers robust scale transformation

More Related Videos

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Published on: August 29, 2025

Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers

Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers

Published on: March 14, 2018

Related Experiment Videos

Last Updated: Dec 18, 2025

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Published on: August 29, 2025

Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers

Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers

Published on: March 14, 2018

Area of Science:

Educational Measurement
Psychometrics
Statistics

Background:

Common item parameter estimates can change abnormally due to item overexposure or curriculum shifts.
Outlier common items deviate from the expected pattern of normally behaving common items.
Eliminating outliers improves equating accuracy but can disrupt content balance.

Purpose of the Study:

To examine the performance of robust scale transformation methods with multiple outlier common items.
To assess the effectiveness of these methods in reducing outlier influence on scale transformation and equating.
To compare robust methods against traditional outlier detection and elimination techniques.

Main Methods:

Simulation study design.
Application of robust scale transformation methods.
Analysis of multiple outlying common items.

Main Results:

Robust scale transformation methods successfully reduced the influence of multiple outliers on scale transformation and equating.
The robust methods demonstrated comparable performance to traditional outlier detection and elimination methods.
Adequate content balance was maintained when using robust methods.

Conclusions:

Robust scale transformation methods are effective for addressing multiple outlier common items in test equating.
These methods offer a viable alternative to traditional outlier handling, balancing accuracy and content integrity.