Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Skewness

Skewness

The measures of central tendency calculated from a data set may not reveal much about its intrinsic distribution. If a plot is made of the data set’s values, the mean and the median may not only differ, but also the plot may have more values on one side of the central tendencies. Such a data set is said to be skewed towards that side.
The longer the tail of the plot on one side, the more skewed it is. The skewness of a data set’s values suggests that the measures of central tendency...

Types of Skewness

Types of Skewness

If the frequency distribution of a data set is more inclined towards smaller or larger values, the distribution is said to be skewed. If data values are skewed to the right, then the distribution is called positively skewed. Conversely, if the plot is skewed to the left, the distribution is called negatively skewed.
For instance, in the middle of a pandemic, the geographical distribution of vaccine coverage may be positively skewed towards populations in the global north countries. However,...

Extraction: Partition and Distribution Coefficients

Extraction: Partition and Distribution Coefficients

The distribution law or Nernst's distribution law is the law that governs the distribution of a solute between two immiscible solvents. This law, also known as the partition law, states that if a solute is added to the mixture of two immiscible solvents at a constant temperature, the solute is distributed between the two solvents in such a way that the ratio of solute concentrations in the solvents remains constant at equilibrium.
For extracting a solute from an aqueous phase into an...

Modified Boxplots

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Design, synthesis, and biological evaluation of tetrahydroquinolin derivatives as potent inhibitors of CBP bromodomain.

Bioorganic chemistry·2020

Same author

Expression and clinical diagnostic value of miR-383 in patients with severe preeclampsia.

Cellular and molecular biology (Noisy-le-Grand, France)·2020

Same author

FNDC5 Attenuates Oxidative Stress and NLRP3 Inflammasome Activation in Vascular Smooth Muscle Cells via Activating the AMPK-SIRT1 Signal Pathway.

Oxidative medicine and cellular longevity·2020

Same author

Obesity-induced excess of 17-hydroxyprogesterone promotes hyperglycemia through activation of glucocorticoid receptor.

The Journal of clinical investigation·2020

Same author

Human neutralizing antibodies elicited by SARS-CoV-2 infection.

Nature·2020

Same author

Extensive intracranial arterial dolichoectasia involving distal branches of intracranial arteries: two cases report and review of the literature.

The International journal of neuroscience·2020

Same journal

RETRACTION: An IoMT-Based Approach for Real-Time Monitoring Using Wearable Neuro-Sensors.

Journal of healthcare engineering·2026

Same journal

RETRACTION: Learning to Discriminate Adversarial Examples by Sensitivity Inconsistency in IoHT Systems.

Journal of healthcare engineering·2026

Same journal

RETRACTION: Multi-Chaos-Based Lightweight Image Encryption-Compression for Secure Occupancy Monitoring.

Journal of healthcare engineering·2026

Same journal

RETRACTION: Image Risk Assessment of the Thyroid Cancer Model Based on Discriminant Analysis and the Value of TAP and CEA Combined Detection.

Journal of healthcare engineering·2026

Same journal

RETRACTION: Meta-Analysis of the Prognostic Value of Narcotrend Monitoring of Different Depths of Anesthesia and Different Bispectral Index (BIS) Values for Cognitive Dysfunction after Tumor Surgery in Elderly Patients.

Journal of healthcare engineering·2026

Same journal

Correction to "Representation of Differential Learning Method for Mitosis Detection".

Journal of healthcare engineering·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 20, 2026

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Yufei Gao¹, Yanjie Zhou², Bing Zhou³

¹College of Information Science and Technology, Beijing Normal University, Beijing, China

Journal of Healthcare Engineering

|October 27, 2017

Summary

This summary is machine-generated.

Partition Tuning-based Skew Handling (PTSH) efficiently addresses data skew in big data analytics. This algorithm improves MapReduce performance, especially for healthcare data mining, reducing analysis time.

More Related Videos

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

Published on: February 10, 2017

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Feb 20, 2026

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

Published on: February 10, 2017

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Big data analytics
Healthcare informatics
Distributed computing

Background:

The healthcare industry generates vast datasets, necessitating efficient big data analytics.
The MapReduce programming model is widely used but suffers from data skew, impacting performance.
Existing solutions for data skew in MapReduce have limitations.

Purpose of the Study:

To introduce and evaluate the Partition Tuning-based Skew Handling (PTSH) algorithm for mitigating data skew in MapReduce.
To demonstrate the efficiency and robustness of PTSH using simulated and real-world healthcare datasets.
To assess the impact of PTSH on association rule mining (ARM) for healthcare data.

Main Methods:

PTSH employs a two-stage partitioning strategy and partition tuning to disperse and recombine key-value pairs.
The algorithm was tested against native Hadoop, Closer, and LEEN on diverse datasets.
Performance was evaluated based on efficiency in handling data skew and overall job completion time.

Main Results:

PTSH effectively handles data skew in MapReduce, outperforming native Hadoop, Closer, and LEEN.
The algorithm significantly improves the performance of MapReduce jobs dealing with skewed data.
Adoption of PTSH led to a substantial reduction in the time required for association rule extraction on healthcare data.

Conclusions:

The PTSH algorithm offers an efficient solution for data skew challenges in MapReduce.
PTSH enhances the performance and applicability of MapReduce for big data analytics in healthcare.
PTSH is particularly beneficial for accelerating association rule mining on large healthcare datasets.