A data-centric machine learning approach to improve prediction of glioma grades using low-imbalance TCGA data

  • 0Fundación Estatal, Salud, Infancia y Bienestar Social, 28029, Madrid, Spain.

|

|

Summary

This summary is machine-generated.

This study focused on improving glioma grade prediction by enhancing data quality rather than just algorithms. Data standardization and balancing improved machine learning model performance, especially for ensemble methods.

Area Of Science

  • Computational Biology and Bioinformatics
  • Oncology and Cancer Research
  • Medical Machine Learning

Background

  • Accurate glioma grading is vital for patient prognosis and treatment planning.
  • Molecular biomarkers and machine learning show promise for improving diagnostic accuracy.
  • Existing research often prioritizes model-centric approaches over data quality.

Purpose Of The Study

  • To investigate a data-centric machine learning approach for predicting glioma grades.
  • To evaluate the impact of data standardization and class balancing on model performance.
  • To compare the effectiveness of standard machine learning models and classifier ensembles.

Main Methods

  • Applied a data-centric approach focusing on data quality improvements.
  • Utilized six performance metrics to comprehensively evaluate model predictions.
  • Employed four feature ranking algorithms for attribute analysis on a clinical and molecular biomarker dataset.

Main Results

  • Data standardization and oversizing the minority class significantly improved prediction performance for four machine learning models and two classifier ensembles.
  • Classifier ensembles demonstrated superior performance compared to three standard prediction models.
  • Identified key statistical characteristics and informative attributes within the glioma dataset.

Conclusions

  • A data-centric approach, emphasizing data quality, can enhance machine learning model performance in glioma grade prediction.
  • Data standardization and balancing are effective strategies for improving predictive accuracy.
  • Classifier ensembles show strong potential for accurate glioma grading using integrated clinical and molecular data.