Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

A preprocessing method for improving data mining techniques. Application to a large medical diabetes database.

A Duhamel1, M C Nuttens, P Devos

  • 1CERIM-Faculté de Médecine-1, Place de Verdun-59045 Lille, France.

Studies in Health Technology and Informatics
|December 11, 2003
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The free plasma amyloid Aβ<sub>1</sub><sub>-</sub><sub>42</sub>/Aβ<sub>1</sub><sub>-</sub><sub>40</sub> ratio predicts conversion to dementia for subjects with mild cognitive impairment with performance equivalent to that of the total plasma Aβ<sub>1</sub><sub>-</sub><sub>42</sub>/Aβ<sub>1</sub><sub>-</sub><sub>40</sub> ratio. The BALTAZAR study.

Neurobiology of disease·2024
Same author

Does the guidance method affect the doses of botulinum toxin in writer's cramp?

Revue neurologique·2024
Same author

NUTRISEP: Assessment of the nutritional status of patients with multiple sclerosis and link to fatigue.

Revue neurologique·2023
Same author

Development of a comprehensive database for research on foetal acidosis.

European journal of obstetrics, gynecology, and reproductive biology·2022
Same author

PROPENSIX: pressure garment therapy using compressive dynamic Lycra<sup>®</sup> sleeve to improve bi-manual performance in unilateral cerebral palsy: a multicenter randomized controlled trial protocol.

Trials·2022
Same author

Does ultrasound-guidance improve the outcome of botulinum toxin injections in cervical dystonia?

Revue neurologique·2021
Same journal

The Essential Components and Critical Conditions for Success in a Learning Health System in Oncology.

Studies in health technology and informatics·2026
Same journal

Use of Artificial Intelligence in Screening for Adolescent Idiopathic Scoliosis: A Scoping Review.

Studies in health technology and informatics·2026
Same journal

Movement Related Biomechanics in Adolescent Idiopathic Scoliosis: A Review of Reviews.

Studies in health technology and informatics·2026
Same journal

The Impact of Surgical Correction of Adolescent Idiopathic Scoliosis Using Posterior Spinal Fusion on Selected Radiological Parameters and Respiratory Function.

Studies in health technology and informatics·2026
Same journal

Acute Effect of Physio-logic® Exercises on Muscle Tone and Stiffness in Adolescent Idiopathic Scoliosis Patients: A Preliminary Study.

Studies in health technology and informatics·2026
Same journal

Effects of Integrated Music and Occupational Therapy on Motor and Autonomic Function in Children with Neurogenic Scoliosis.

Studies in health technology and informatics·2026
See all related articles

Data preprocessing in Knowledge Discovery in Databases (KDD) is crucial for clinical data mining. Decision tree imputation is more effective than mode imputation for handling missing values in large medical databases.

Area of Science:

  • Data Science
  • Medical Informatics
  • Database Management

Background:

  • Knowledge Discovery in Databases (KDD) is valuable for analyzing large clinical datasets.
  • Data preprocessing, including cleaning and handling missing values, is critical for data mining accuracy and consumes significant project time (approx. 80%).

Purpose of the Study:

  • To analyze the data preprocessing step within the KDD methodology.
  • To develop and evaluate tools for handling inconsistent data and missing values in clinical databases.

Main Methods:

  • The study divided preprocessing into data cleaning, missing value analysis, and imputation method selection.
  • Logical rules and cluster analysis were used for data cleaning and identifying poorly filled records.
  • Multivariate statistical procedures analyzed the missing data mechanism, comparing mode imputation with decision tree imputation.

Related Experiment Videos

Main Results:

  • A system of logical rules corrected essential data errors, and cluster analysis identified 10% of incomplete patient files.
  • Multivariate analysis indicated that the missing data mechanism was random.
  • Decision tree imputation outperformed mode imputation for variables with <10% missing values and <4 categories.

Conclusions:

  • Effective data cleaning and missing value handling are essential for reliable KDD results in clinical research.
  • Decision tree imputation offers a superior method for specific missing data scenarios in large medical databases compared to simple mode imputation.