The NEAT Equating Via Chaining Random Forests in the Context of Small Sample Sizes: A Machine-Learning Method | JoVE Visualize

Area of Science:

Psychometrics
Educational Measurement
Machine Learning

Background:

The nonequivalent groups with anchor test (NEAT) design is commonly used in educational measurement for test equating.
Managing missing data within the NEAT design is crucial for accurate score equating.
Traditional equating methods may face challenges with small sample sizes and short test lengths.

Purpose of the Study:

To introduce and evaluate a machine learning-based imputation technique, chaining random forests (CRF), for equating tasks within the NEAT design.
To propose seven CRF-based imputation equating methods using different data augmentation strategies.
To compare the performance of CRF-based methods against traditional equating methods under various simulation conditions.

Main Methods:

A simulation study was conducted to examine the equating performance of seven proposed CRF-based imputation equating methods.
Factors investigated included test length, sample size, anchor item ratio, group equivalence, and anchor type.
The performance of CRF methods was compared to five traditional equating methods (Tucker, Levine, equipercentile, circle-arc, Rasch concurrent calibration).

Main Results:

CRF-based methods, particularly those integrating the Tucker method's results (e.g., IMP_total_Tucker, IMP_pair_Tucker), demonstrated superior performance.
These ML-enhanced methods provided more robust and trustworthy estimates for missing data in equating.
Accurate equated scores were achieved more consistently with CRF-based methods compared to other approaches, especially in challenging conditions (short tests, small samples).

Conclusions:

Machine learning techniques, specifically CRF, offer significant advantages for test equating in NEAT designs.
CRF-based imputation methods are highly effective in addressing missing data, leading to improved equating accuracy.
The proposed CRF methods, especially when combined with the Tucker method, are recommended for practical applications involving short tests and limited sample sizes.