Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jul 4, 2026

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Challenges in Preprocessing Routine Laboratory Data for Machine Learning.

Katharina Wendt¹, Michael Marschollek¹, Thomas Illig²

¹Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Germany.

Studies in Health Technology and Informatics

|July 3, 2026

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A Visual Explorer for Retrospective Trajectories in Children Following Heart Surgery.

Studies in health technology and informatics·2026

Same author

Feature Reduction or Sample Reduction? A Stability Analysis of Parkinson's Disease Clustering.

Studies in health technology and informatics·2026

Same author

CAS2FHIR: Transforming UIMA CAS Annotations to FHIR Bundles.

Studies in health technology and informatics·2026

Same author

First Insights into Sensor-Based Ballistocardiographic Measurements in Patients with Heart Failure.

Studies in health technology and informatics·2026

Same author

Persisting Intensive Care Unit (ICU) Monitoring Data in HDF5.

Studies in health technology and informatics·2026

Same author

Identification of risk factors for hospital-onset bacteremia to inform a routine data based risk prediction - an umbrella review.

The Journal of hospital infection·2026

Same journal

A GenAI Pipeline for Violinist Kinematic Data Management.

Studies in health technology and informatics·2026

Same journal

AMAL-For-Qatar: A Comprehensive AI Ecosystem for Fetal Ultrasound Analysis - Project Overview and Achievements.

Studies in health technology and informatics·2026

Same journal

Longitudinal Treatment-Aware Multimodal AI for Dermatology: A Scoping Review.

Studies in health technology and informatics·2026

Same journal

Predicting Postpartum Depression Using Imbalance-Aware Machine Learning.

Studies in health technology and informatics·2026

Same journal

Validation of Deep-Learning Models for Autosegmentation of Brain Metastases.

Studies in health technology and informatics·2026

Same journal

Delay-Dependent Gating in Modular RNNs.

Studies in health technology and informatics·2026

See all related articles

Preprocessing routine laboratory data is crucial for accurate post-Covid syndrome classification. Proper data harmonization and handling missing values significantly improve machine learning model performance in identifying biological signatures for Long Covid patients.

Area of Science:

Biomedical data science
Clinical informatics
Longitudinal cohort studies

Background:

Post-Covid syndrome lacks specific biomarkers, complicating diagnosis and relying on exclusion criteria.
Routine laboratory data presents potential for identifying biological signatures but faces challenges with data quality for machine learning.
The German NAPKON cohort provides a valuable dataset for investigating Long Covid patient characteristics.

Purpose of the Study:

To evaluate the impact of data preprocessing techniques on the classification performance of machine learning models for post-Covid syndrome.
To assess the influence of unit harmonization, missing value imputation, and inter-laboratory variability on diagnostic accuracy.
To determine optimal preprocessing strategies for utilizing routine laboratory data in Long Covid research.

Keywords:

NAPKON cohort Post-COVID syndrome data harmonization data preprocessing laboratory data machine learning missing values

Related Experiment Videos

Last Updated: Jul 4, 2026

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Main Methods:

Utilized 52 laboratory parameters from 1,292 participants (1,130 recovered, 162 post-Covid) in the German NAPKON cohort across four time points.
Implemented data preprocessing including unit harmonization, missing value handling, and statistical assessment of inter-laboratory variability using Kruskal-Wallis and Wilcoxon tests.
Investigated classification performance under varying missingness thresholds (up to 70% missing per sample and feature).

Main Results:

Significant differences in laboratory values were observed across units, even after harmonization (p < 0.05).
Optimal classification performance was achieved when allowing up to 70% missingness per sample and feature.
Data harmonization and effective handling of missing values were identified as critical factors influencing model performance.

Conclusions:

Effective preprocessing, particularly addressing missing values and data harmonization, is essential for reliable machine learning-based classification of post-Covid syndrome.
Despite preprocessing, residual variability in laboratory data persists, influenced by biological and technical factors.
Further research is needed to refine preprocessing methods and account for inherent data variability in Long Covid biomarker discovery.