K-nearest neighbor algorithm for imputing missing longitudinal prenatal alcohol data
View abstract on PubMed
Summary
This summary is machine-generated.The K Nearest Neighbor (k-NN) algorithm accurately imputes missing alcohol consumption data in pregnant women. This machine learning approach improves data completeness for longitudinal pregnancy studies.
Area Of Science
- Epidemiology
- Biostatistics
- Machine Learning
Background
- Longitudinal studies on pregnant women often face challenges with missing alcohol consumption data.
- Inaccurate or incomplete data can bias findings in studies of prenatal alcohol exposure.
- Robust imputation methods are crucial for reliable analysis of such data.
Purpose Of The Study
- To evaluate the effectiveness of the K Nearest Neighbor (k-NN) machine learning algorithm for imputing missing daily alcohol consumption data.
- To assess the accuracy of k-NN imputation in a large prospective cohort of pregnant women (Safe Passage study).
- To determine optimal parameters for k-NN imputation to minimize error.
Main Methods
- Utilized data from the Safe Passage study (n=11,083) with missing alcohol consumption data (11.4%).
- Applied the k-NN algorithm, weighting distances and matching by day of week for imputation.
- Validated imputation accuracy by randomly deleting data segments and comparing imputed to actual values.
Main Results
- The k-NN algorithm with 5 nearest neighbors (K=5) and 55-day segments yielded the lowest imputation error.
- Imputed values closely matched actual values for 64% of deleted segments and were within +/-1 drink/day for 31%.
- Imputation accuracy demonstrated variability across study sites due to differences in drinking patterns and missing data proportions.
Conclusions
- The K Nearest Neighbor (k-NN) algorithm offers a highly accurate method for imputing missing alcohol data in longitudinal pregnancy studies.
- k-NN imputation can significantly enhance the quality and reliability of data in studies examining alcohol use during pregnancy.
- This machine learning approach provides a valuable tool for addressing data gaps in sensitive public health research.

