Factor Recovery in Binary Data Sets: A Simulation | JoVE Visualize

Area of Science:

Psychometrics
Statistical analysis
Data science

Background:

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.
Binary data, consisting of only two possible values, presents unique challenges in factor analysis due to its discrete nature.
Assessing the performance of different correlation coefficients is crucial for accurate factor recovery in binary datasets.

Purpose of the Study:

To compare the performance of phi coefficients and tetrachoric correlations in factor analysis of binary data.
To evaluate two key dimensions: accuracy of nontrivial factor identification and factor structure recovery.
To provide recommendations for the preferred method in practical factor analysis applications.

Main Methods:

The study employed factor analysis techniques to analyze binary data.
Performance was assessed using phi coefficients and tetrachoric correlations.
Two primary evaluation criteria were used: identification of meaningful factors and reconstruction of known factor structures.

Main Results:

Both phi coefficients and tetrachoric correlations showed poor performance in identifying nontrivial factors, with phi coefficients performing marginally better.
Factor structure recovery was generally good when the correct number of factors was specified.
Phi coefficients demonstrated superior factor structure recovery and better prevention of item misclassification compared to tetrachoric correlations.
Tetrachoric correlations were more effective at including relevant items but resulted in more Heywood cases (unrealistic negative variances).

Conclusions:

Phi coefficients are generally recommended over tetrachoric correlations for factor analysis of binary data due to better overall performance in structure recovery and fewer estimation issues.
While tetrachoric correlations may be better at item inclusion, the prevalence of Heywood cases suggests caution in their application.
The findings support the use of phi coefficients for more robust factor analysis with binary datasets in most practical scenarios.