Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Randomized Experiments01:13

Randomized Experiments

6.6K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
6.6K
Censoring Survival Data01:09

Censoring Survival Data

55
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
55
Data: Types and Distribution01:19

Data: Types and Distribution

667
In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...
667
Bootstrapping01:24

Bootstrapping

574
The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...
574
Regression Toward the Mean01:52

Regression Toward the Mean

6.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.3K
Biostatistics: Overview01:20

Biostatistics: Overview

214
Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...
214

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Dual-functional MXene-integrated GelMA microspheres for synergistic chemo/photothermal therapy: In vitro 2D/3D multi-cancer evaluation and in vivo breast cancer validation.

International journal of biological macromolecules·2026
Same author

'Shelter From the Storm': A FINGER-Like Five-Domain Lifestyle Intervention to Promote Cognitive Health and Well-Being in Older Adults in Taiwan.

International journal of older people nursing·2026
Same author

Prevalence and diagnostic signs of convergence insufficiency among schoolchildren in Kaohsiung, Taiwan: a cross-sectional study.

BMC ophthalmology·2026
Same author

Construction of a 3D Bioprinted Microfluidic Platform to Study Breast Cancer Bone Metastasis and Tumor Microenvironmental Influences.

ACS applied materials & interfaces·2025
Same author

Predicting functional outcomes after a stroke event by clinical text notes: A comparative study of traditional machine learning and deep learning methods.

Health informatics journal·2025
Same author

Mobile App-Based Intervention and Cardiovascular Risk Factors in Patients With Uncontrolled Type 2 Diabetes: A Randomized Clinical Trial.

JAMA network open·2025
Same journal

The role of digital resources in surgical education: An analysis of YouTube videos on dynamic stabilization.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Behavioral patterns in iGaming across territories: Psychiatric and AI-driven insights via the internet of behavior.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Leveraging personal health records for early heart failure risk prediction through AI-driven modeling.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

From data to prevention: A systematic review of artificial intelligence applications in sports injury prediction.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Leadership styles and work outcome in healthcare sector: Insights from bibliometric analysis.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
Same journal

Network analysis revealing research focus of the German Congress of Orthopedics and Trauma Surgery 2021.

Technology and health care : official journal of the European Society for Engineering and Medicine·2026
See all related articles

Related Experiment Video

Updated: May 21, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.3K

Interaction effect between data discretization and data resampling for class-imbalanced medical datasets.

Min-Wei Huang1,2,3, Chih-Fong Tsai4, Wei-Chao Lin5,6,7

  • 1Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung.

Technology and Health Care : Official Journal of the European Society for Engineering and Medicine
|March 19, 2025
PubMed
Summary
This summary is machine-generated.

Combining data discretization and resampling improves classifier performance on imbalanced medical data. Optimal strategies depend on dataset type, with oversampling often enhancing results compared to baseline methods.

Keywords:
class imbalancedata miningdata resamplingdiscretizationmachine learning

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K

Related Experiment Videos

Last Updated: May 21, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.3K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K

Area of Science:

  • Data Mining
  • Machine Learning
  • Medical Informatics

Background:

  • Data discretization transforms continuous features to discrete ones, aiding specific data mining algorithms.
  • Class-imbalanced medical datasets pose challenges for accurate classification.
  • Data resampling techniques (oversampling, undersampling, hybrid) are used to balance training data.

Purpose of the Study:

  • To evaluate the impact of combining data discretization and resampling on classifier performance for imbalanced medical datasets.
  • To compare the order of applying discretization and resampling steps.
  • To identify optimal preprocessing strategies for improved classification accuracy.

Main Methods:

  • Experiments conducted on 11 two-class and 3 multiclass imbalanced medical datasets.
  • Discretization algorithms: ChiMerge and Minimum Description Length Principle (MDLP).
  • Resampling algorithms: Tomek links undersampling, Synthetic Minority Oversampling Technique (SMOTE), and SMOTE-Tomek.
  • Classifiers: Support Vector Machine (SVM), C4.5 decision tree, and Random Forest (RF).

Main Results:

  • Combined approaches yielded higher Area Under the ROC Curve (AUC) rates compared to baseline methods (0.8%-3.5% for two-class, 0.9%-2.5% for multiclass).
  • For two-class data, MDLP discretization followed by SMOTE oversampling achieved the highest AUC with minimal computational cost.
  • For multiclass data, SMOTE or SMOTE-Tomek resampling before ChiMerge discretization provided the best performance.

Conclusions:

  • Oversampling techniques generally improve classifier performance over baseline methods.
  • Data discretization alone does not guarantee improved classifier performance.
  • Combining discretization and resampling offers potential for higher AUC rates on imbalanced medical datasets.