Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Bias

Bias

Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...

Random Sampling Method

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Stratified Sampling Method

Stratified Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Optimizing Machine Learning Models for Accessible Early Cognitive Impairment Prediction: A Novel Cost-effective Model Selection Algorithm.

IEEE access : practical innovations, open solutions·2025

Same author

Enhancing accessibility: Development and usability testing of mobile application mitigating sexual harassment for visually impaired masseurs.

Assistive technology : the official journal of RESNA·2024

Same author

Neural matrix factorization++ based recommendation system.

F1000Research·2024

Same author

Machine learning methods to predict particulate matter PM <sub>2.5</sub>.

F1000Research·2022

Same author

Front-end deep learning web apps development and deployment: a review.

Applied intelligence (Dordrecht, Netherlands)·2022

Same author

Improving the support for XML dynamic updates using a hybridization labeling scheme (ORD-GAP).

F1000Research·2022

Same journal

Sentiment Analysis of Acceptance TVET Online Courses on the Skill Academy App from Google Play: Leveraging Text Mining with Comparison Machine Learning Model.

F1000Research·2026

Same journal

Emotional intelligence: An important skill to learn now more than ever.

F1000Research·2026

Same journal

East Mediterranean Lineage of <i>Brucella melitensis</i> in Human Isolates and Milk Samples in Oman Using MLVA-14.

F1000Research·2026

Same journal

Application of K-Means Clustering for Job Applicant Analysis in Construction Firms Using R.

F1000Research·2026

Same journal

The influence of self-esteem and emotional intelligence on addiction to social networks in Peruvian university students.

F1000Research·2026

Same journal

A Bibliometric Analysis of Music's Role in Promoting Well-Being in Health Science Research.

F1000Research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 29, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance

Maw Maw¹, Su-Cheng Haw¹, Chin-Kuan Ho¹

¹Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia.

|September 9, 2022

Summary

This summary is machine-generated.

Data sampling techniques used in customer churn prediction can introduce gender-based discrimination. Random Forest classifiers performed best, but some sampling methods exacerbated fairness issues, particularly for the female group.

Keywords:

Algorithmic fairness Class imbalance problem Customer churn prediction Data sampling techniques

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Related Experiment Videos

Last Updated: Aug 29, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Area of Science:

Machine Learning
Data Science
Algorithmic Fairness

Background:

Customer churn prediction (CCP) is crucial for service providers, but often suffers from class imbalance problems (CIP).
Data sampling techniques (DSTs) are used to mitigate CIP, but their impact on algorithmic fairness is not well understood.
Algorithmic fairness, particularly regarding gender discrimination, is an increasingly important consideration in machine learning applications.

Purpose of the Study:

To investigate the effect of DSTs on algorithmic fairness in CCP.
To compare the performance and fairness of classification models before and after applying DSTs.
To identify potential gender-based discrimination introduced or exacerbated by DSTs.

Main Methods:

Reviewed four common DSTs applied to three real-world imbalanced datasets.
Utilized six popular classification techniques for CCP.
Evaluated both classifier performance and algorithmic fairness using established metrics, focusing on gender disparities.

Main Results:

Random Forest demonstrated superior performance across all datasets.
SMOTE and ADASYN techniques were found to increase discrimination against the female group.
Unintentional discrimination was higher in original imbalanced data for Logistic Regression, LightGBM, and XGBoost.

Conclusions:

There is a significant gap in systematic research on DSTs' impact on algorithmic fairness in CCP.
This study provides critical insights into how sampling methods can affect fairness in churn prediction models.
Findings emphasize the need to consider algorithmic fairness when applying DSTs in real-world CCP scenarios.