Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jul 4, 2026

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Challenges in Preprocessing Routine Laboratory Data for Machine Learning.

Katharina Wendt1, Michael Marschollek1, Thomas Illig2

  • 1Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Germany.

Studies in Health Technology and Informatics
|July 3, 2026
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Visual Explorer for Retrospective Trajectories in Children Following Heart Surgery.

Studies in health technology and informatics·2026
Same author

Feature Reduction or Sample Reduction? A Stability Analysis of Parkinson's Disease Clustering.

Studies in health technology and informatics·2026
Same author

CAS2FHIR: Transforming UIMA CAS Annotations to FHIR Bundles.

Studies in health technology and informatics·2026
Same author

First Insights into Sensor-Based Ballistocardiographic Measurements in Patients with Heart Failure.

Studies in health technology and informatics·2026
Same author

Persisting Intensive Care Unit (ICU) Monitoring Data in HDF5.

Studies in health technology and informatics·2026
Same author

Identification of risk factors for hospital-onset bacteremia to inform a routine data based risk prediction - an umbrella review.

The Journal of hospital infection·2026
Same journal

A GenAI Pipeline for Violinist Kinematic Data Management.

Studies in health technology and informatics·2026
Same journal

AMAL-For-Qatar: A Comprehensive AI Ecosystem for Fetal Ultrasound Analysis - Project Overview and Achievements.

Studies in health technology and informatics·2026
Same journal

Longitudinal Treatment-Aware Multimodal AI for Dermatology: A Scoping Review.

Studies in health technology and informatics·2026
Same journal

Predicting Postpartum Depression Using Imbalance-Aware Machine Learning.

Studies in health technology and informatics·2026
Same journal

Validation of Deep-Learning Models for Autosegmentation of Brain Metastases.

Studies in health technology and informatics·2026
Same journal

Delay-Dependent Gating in Modular RNNs.

Studies in health technology and informatics·2026
See all related articles

Preprocessing routine laboratory data is crucial for accurate post-Covid syndrome classification. Proper data harmonization and handling missing values significantly improve machine learning model performance in identifying biological signatures for Long Covid patients.

Area of Science:

  • Biomedical data science
  • Clinical informatics
  • Longitudinal cohort studies

Background:

  • Post-Covid syndrome lacks specific biomarkers, complicating diagnosis and relying on exclusion criteria.
  • Routine laboratory data presents potential for identifying biological signatures but faces challenges with data quality for machine learning.
  • The German NAPKON cohort provides a valuable dataset for investigating Long Covid patient characteristics.

Purpose of the Study:

  • To evaluate the impact of data preprocessing techniques on the classification performance of machine learning models for post-Covid syndrome.
  • To assess the influence of unit harmonization, missing value imputation, and inter-laboratory variability on diagnostic accuracy.
  • To determine optimal preprocessing strategies for utilizing routine laboratory data in Long Covid research.
Keywords:
NAPKON cohortPost-COVID syndromedata harmonizationdata preprocessinglaboratory datamachine learningmissing values

Related Experiment Videos

Last Updated: Jul 4, 2026

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Main Methods:

  • Utilized 52 laboratory parameters from 1,292 participants (1,130 recovered, 162 post-Covid) in the German NAPKON cohort across four time points.
  • Implemented data preprocessing including unit harmonization, missing value handling, and statistical assessment of inter-laboratory variability using Kruskal-Wallis and Wilcoxon tests.
  • Investigated classification performance under varying missingness thresholds (up to 70% missing per sample and feature).

Main Results:

  • Significant differences in laboratory values were observed across units, even after harmonization (p < 0.05).
  • Optimal classification performance was achieved when allowing up to 70% missingness per sample and feature.
  • Data harmonization and effective handling of missing values were identified as critical factors influencing model performance.

Conclusions:

  • Effective preprocessing, particularly addressing missing values and data harmonization, is essential for reliable machine learning-based classification of post-Covid syndrome.
  • Despite preprocessing, residual variability in laboratory data persists, influenced by biological and technical factors.
  • Further research is needed to refine preprocessing methods and account for inherent data variability in Long Covid biomarker discovery.