Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Bias01:22

Bias

4.8K
Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...
4.8K
Random Sampling Method01:09

Random Sampling Method

12.1K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
12.1K
Cluster Sampling Method01:20

Cluster Sampling Method

12.4K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
12.4K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.8K
Stratified Sampling Method01:16

Stratified Sampling Method

12.6K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...
12.6K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.4K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Optimizing Machine Learning Models for Accessible Early Cognitive Impairment Prediction: A Novel Cost-effective Model Selection Algorithm.

IEEE access : practical innovations, open solutions·2025
Same author

Enhancing accessibility: Development and usability testing of mobile application mitigating sexual harassment for visually impaired masseurs.

Assistive technology : the official journal of RESNA·2024
Same author

Neural matrix factorization++ based recommendation system.

F1000Research·2024
Same author

Machine learning methods to predict particulate matter PM <sub>2.5</sub>.

F1000Research·2022
Same author

Front-end deep learning web apps development and deployment: a review.

Applied intelligence (Dordrecht, Netherlands)·2022
Same author

Improving the support for XML dynamic updates using a hybridization labeling scheme (ORD-GAP).

F1000Research·2022
Same journal

Sentiment Analysis of Acceptance TVET Online Courses on the Skill Academy App from Google Play: Leveraging Text Mining with Comparison Machine Learning Model.

F1000Research·2026
Same journal

Emotional intelligence: An important skill to learn now more than ever.

F1000Research·2026
Same journal

East Mediterranean Lineage of <i>Brucella melitensis</i> in Human Isolates and Milk Samples in Oman Using MLVA-14.

F1000Research·2026
Same journal

Application of K-Means Clustering for Job Applicant Analysis in Construction Firms Using R.

F1000Research·2026
Same journal

The influence of self-esteem and emotional intelligence on addiction to social networks in Peruvian university students.

F1000Research·2026
Same journal

A Bibliometric Analysis of Music's Role in Promoting Well-Being in Health Science Research.

F1000Research·2026
See all related articles

Related Experiment Video

Updated: Aug 29, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.6K

Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance

Maw Maw1, Su-Cheng Haw1, Chin-Kuan Ho1

  • 1Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia.

F1000Research
|September 9, 2022
PubMed
Summary
This summary is machine-generated.

Data sampling techniques used in customer churn prediction can introduce gender-based discrimination. Random Forest classifiers performed best, but some sampling methods exacerbated fairness issues, particularly for the female group.

Keywords:
Algorithmic fairnessClass imbalance problemCustomer churn predictionData sampling techniques

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.6K

Related Experiment Videos

Last Updated: Aug 29, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.6K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.6K

Area of Science:

  • Machine Learning
  • Data Science
  • Algorithmic Fairness

Background:

  • Customer churn prediction (CCP) is crucial for service providers, but often suffers from class imbalance problems (CIP).
  • Data sampling techniques (DSTs) are used to mitigate CIP, but their impact on algorithmic fairness is not well understood.
  • Algorithmic fairness, particularly regarding gender discrimination, is an increasingly important consideration in machine learning applications.

Purpose of the Study:

  • To investigate the effect of DSTs on algorithmic fairness in CCP.
  • To compare the performance and fairness of classification models before and after applying DSTs.
  • To identify potential gender-based discrimination introduced or exacerbated by DSTs.

Main Methods:

  • Reviewed four common DSTs applied to three real-world imbalanced datasets.
  • Utilized six popular classification techniques for CCP.
  • Evaluated both classifier performance and algorithmic fairness using established metrics, focusing on gender disparities.

Main Results:

  • Random Forest demonstrated superior performance across all datasets.
  • SMOTE and ADASYN techniques were found to increase discrimination against the female group.
  • Unintentional discrimination was higher in original imbalanced data for Logistic Regression, LightGBM, and XGBoost.

Conclusions:

  • There is a significant gap in systematic research on DSTs' impact on algorithmic fairness in CCP.
  • This study provides critical insights into how sampling methods can affect fairness in churn prediction models.
  • Findings emphasize the need to consider algorithmic fairness when applying DSTs in real-world CCP scenarios.