Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Polymer-Based Microencapsulation of <i>Hedychium coronarium</i> Rhizome Essential Oil for Enhanced Bioactivity Stability and Reduced Irritation.

Pharmaceutics·2026
Same author

Speech analysis for differentiating bipolar disorder and major depressive disorder during euthymic states.

Annals of general psychiatry·2026
Same author

Developing a clinical decision support tool for stratifying stroke risk in patients presenting with dizziness to the emergency department: A retrospective cohort study.

Digital health·2026
Same author

Enhanced Prediction of Atrial Fibrillation in Patients With Ischemic Stroke Through Electronic Medical Records and Text Mining: Algorithm Development and Validation.

JMIR medical informatics·2026
Same author

An Intelligent Trial Eligibility Screening Tool Using Natural Language Processing With a Block-Based Visual Programming Interface: Development and Usability Study.

JMIR medical informatics·2025
Same author

Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Application.

JMIR medical informatics·2025
Same journal

The role of digital resources in surgical education: An analysis of YouTube videos on dynamic stabilization.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Behavioral patterns in iGaming across territories: Psychiatric and AI-driven insights via the internet of behavior.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Leveraging personal health records for early heart failure risk prediction through AI-driven modeling.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

From data to prevention: A systematic review of artificial intelligence applications in sports injury prediction.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Leadership styles and work outcome in healthcare sector: Insights from bibliometric analysis.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Network analysis revealing research focus of the German Congress of Orthopedics and Trauma Surgery 2021.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
See all related articles

Related Experiment Video

Updated: Apr 19, 2026

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts
07:50

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

16.7K

An efficient data preprocessing approach for large scale medical data mining.

Ya-Han Hu1, Wei-Chao Lin2, Chih-Fong Tsai3

  • 1Department of Information Management, National Chung Cheng University, Taiwan.

Technology and Health Care : Official Journal of the European Society for Engineering and Medicine
|December 18, 2014
PubMed
Summary
This summary is machine-generated.

This study introduces an efficient data preprocessing (EDP) approach for large medical datasets. The EDP method significantly reduces computational costs for instance selection while maintaining classification accuracy.

Keywords:
Data preprocessingbreast cancerinstance selectionmedical data miningprotein homology

More Related Videos

Author Spotlight: Advancing Alzheimer's Research &#8211; Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

2.1K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Related Experiment Videos

Last Updated: Apr 19, 2026

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts
07:50

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

16.7K
Author Spotlight: Advancing Alzheimer's Research &#8211; Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

2.1K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Area of Science:

  • Medical data mining
  • Bioinformatics
  • Machine learning

Background:

  • Large medical datasets increase computational costs for data mining.
  • Instance selection is a crucial preprocessing step to reduce data size and maintain mining quality.
  • Existing instance selection methods are time-consuming for very large datasets.

Purpose of the Study:

  • Introduce an efficient data preprocessing (EDP) approach for large-scale medical datasets.
  • Reduce computational cost and time required for instance selection.
  • Maintain high classification accuracy after data preprocessing.

Main Methods:

  • The proposed EDP approach involves two steps: training a model on a small subset of data after instance selection, then using this model to identify noisy data in the larger set.
  • Experiments utilized two large medical datasets (breast cancer, protein homology) with over 100,000 samples.
  • Compared EDP with established instance selection algorithms (IB3, DROP3, genetic algorithms) and classification techniques (CART, k-NN, SVM).

Main Results:

  • The EDP approach reduced computational costs by approximately two to three times compared to state-of-the-art algorithms.
  • The method effectively maintained the final classification accuracy of the models.
  • Demonstrated significant efficiency and effectiveness in large-scale instance selection.

Conclusions:

  • Directly applying existing instance selection algorithms to large medical datasets incurs high computational costs.
  • The proposed EDP approach effectively addresses this by training a model to distinguish between useful and noisy data.
  • EDP offers an efficient and effective solution for large-scale instance selection, balancing computational complexity and classification accuracy.