Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Analysis: Overview01:11

Statistical Analysis: Overview

6.7K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
6.7K
Two-Way ANOVA01:17

Two-Way ANOVA

2.7K
The two-way ANOVA is an extension of the one-way ANOVA. It is a statistical test performed on three or more samples categorized by two factors - a row factor and a column factor. Ronald Fischer mentioned it in 1925 in his book 'Statistical Methods for Researchers.'
The two-way ANOVA analysis initially begins by stating the null hypothesis that there is an interaction effect between the two factors of a dataset. This effect can be visualized using line segments formed by joining the...
2.7K
One-Way ANOVA01:18

One-Way ANOVA

8.0K
One-way ANOVA analyzes more than three samples categorized by one factor. For example, it can compare the average mileage of sports bikes. Here, the data is categorized by one factor - the company. However, one-way ANOVA cannot be used to simultaneously compare the sample mean of three or more samples categorized by two factors. An example of two factors would be sports bikes from different companies driven in different terrains, such as a desert or snowy landscape. Here, two-way ANOVA is used...
8.0K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.8K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.8K
Data Collection by Observations01:08

Data Collection by Observations

12.1K
Data collection refers to a systematic way of obtaining, observing, measuring, and analyzing accurate information. Observational studies are one of the most widely used methods of data collection. It involves collecting data by observing the behavior and physical characteristics of a sample without making any modifications to the sample.
An astronomer viewing the motion and brightness of stars in the sky and recording the data is an example of observational data collection. A botanist recording...
12.1K
Statistical Hypothesis Testing01:16

Statistical Hypothesis Testing

2.0K
Hypothesis testing is a critical statistical procedure facilitating informed, evidence-based decisions. It begins with a hypothesis, which is a tentative explanation, or a prediction about a population parameter. This hypothesis can be either a null hypothesis (H0), indicating no effect or difference, or an alternative hypothesis (Ha), suggesting an effect or difference.
Statistical significance measures the probability that an observed result occurred by chance. If this probability, known as...
2.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Stage-stratified prognostic impact of comorbidities on breast cancer-specific survival: A population-based flexible parametric modelling study.

Cancer epidemiology·2026
Same author

Semiparametric accelerated failure time models with time-varying covariates under partly interval censoring.

BMC medical research methodology·2026
Same author

Physical drivers of transport, dispersion and trapping of microplastics in a macrotidal, hyper-turbid fluvial-estuarine system: A modelling approach.

Marine pollution bulletin·2026
Same author

A data-informed multidimensional composite score for stress assessment.

Acta psychologica·2026
Same author

Bayesian uncertainty quantification to identify population level vaccine hesitancy behaviours.

PloS one·2026
Same author

Hydrocortisone versus dexamethasone in cerebral salt-wasting after aneurysmal subarachnoid hemorrhage.

Brain & spine·2026
Same journal

Thymidylate synthase inhibitory drugs induce p53-dependent pathways differently.

PloS one·2026
Same journal

Top-down and bottom-up attention for joint pattern classification and reconstruction.

PloS one·2026
Same journal

Short- and long-term scaling behavior of blood pressure and pulse arrival time during sleep in healthy controls and patients with obstructive sleep apnea.

PloS one·2026
Same journal

Double DQN-based secrecy energy efficiency and fairness performance in IRS-assisted NOMA systems with friendly jamming.

PloS one·2026
Same journal

10 recommendations for strengthening citizen science for improved societal and ecological outcomes: A co-produced analysis of challenges and opportunities in the 21st century.

PloS one·2026
Same journal

Paying in public: Peer effects, impression management, and willingness to pay on digital payment platforms.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Jul 25, 2025

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
09:27

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

10.1K

SMOTE-CD: SMOTE for compositional data.

Teo Nguyen1,2, Kerrie Mengersen1,3, Damien Sous4,5

  • 1Laboratoire de Mathématiques et de leurs Applications, Université de Pau et des Pays de l'Adour, E2S UPPA, CNRS, Anglet, France.

Plos One
|June 29, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces SMOTE for Compositional Data (SMOTE-CD), a novel method to address class imbalance in compositional data. SMOTE-CD improves model performance across various metrics, particularly enhancing the F1-score for real-world datasets.

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.5K
Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization
08:13

Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization

Published on: May 18, 2020

6.6K

Related Experiment Videos

Last Updated: Jul 25, 2025

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
09:27

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

10.1K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.5K
Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization
08:13

Flypub To Study Ethanol Induced Behavioral Disinhibition and Sensitization

Published on: May 18, 2020

6.6K

Area of Science:

  • Statistics
  • Machine Learning
  • Data Science

Background:

  • Compositional data, representing relative proportions, are prevalent but lack solutions for imbalanced classes.
  • Existing methods do not adequately handle the unique characteristics of imbalanced compositional data.

Purpose of the Study:

  • To propose an adaptation of Synthetic Minority Oversampling TEchnique (SMOTE) for imbalanced compositional data.
  • To introduce SMOTE for Compositional Data (SMOTE-CD) and evaluate its effectiveness.

Main Methods:

  • Developed SMOTE-CD by adapting the original SMOTE algorithm using compositional data operations.
  • Generated synthetic data points through linear combinations of existing data.
  • Tested SMOTE-CD with Gradient Boosting, Neural Networks, and Dirichlet regressors on real and synthetic datasets.

Main Results:

  • SMOTE-CD demonstrated performance improvements across accuracy, cross-entropy, F1-score, R2 score, and RMSE.
  • Oversampling consistently increased the F1-score, especially for real datasets.
  • The impact of oversampling varied by model and data; it sometimes decreased majority class performance but yielded best overall results on real data.

Conclusions:

  • SMOTE-CD is an effective technique for handling imbalanced compositional data.
  • The method shows promise for improving machine learning model performance in scenarios with skewed compositional data.
  • A Python package, smote-cd, is available for implementing the SMOTE-CD method.