Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

15.0K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
15.0K
Skewness01:06

Skewness

19.7K
The measures of central tendency calculated from a data set may not reveal much about its intrinsic distribution. If a plot is made of the data set’s values, the mean and the median may not only differ, but also the plot may have more values on one side of the central tendencies. Such a data set is said to be skewed towards that side.
The longer the tail of the plot on one side, the more skewed it is. The skewness of a data set’s values suggests that the measures of central tendency...
19.7K
Types of Skewness01:09

Types of Skewness

19.0K
If the frequency distribution of a data set is more inclined towards smaller or larger values, the distribution is said to be skewed. If data values are skewed to the right, then the distribution is called positively skewed. Conversely, if the plot is skewed to the left, the distribution is called negatively skewed.
For instance, in the middle of a pandemic, the geographical distribution of vaccine coverage may be positively skewed towards populations in the global north countries. However,...
19.0K
Extraction: Partition and Distribution Coefficients01:14

Extraction: Partition and Distribution Coefficients

5.1K
The distribution law or Nernst's distribution law is the law that governs the distribution of a solute between two immiscible solvents. This law, also known as the partition law, states that if a solute is added to the mixture of two immiscible solvents at a constant temperature, the solute is distributed between the two solvents in such a way that the ratio of solute concentrations in the solvents remains constant at equilibrium.
For extracting a solute from an aqueous phase into an...
5.1K
Modified Boxplots00:57

Modified Boxplots

11.3K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
11.3K
Sampling Plans01:23

Sampling Plans

1.0K
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
1.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Design, synthesis, and biological evaluation of tetrahydroquinolin derivatives as potent inhibitors of CBP bromodomain.

Bioorganic chemistry·2020
Same author

Expression and clinical diagnostic value of miR-383 in patients with severe preeclampsia.

Cellular and molecular biology (Noisy-le-Grand, France)·2020
Same author

FNDC5 Attenuates Oxidative Stress and NLRP3 Inflammasome Activation in Vascular Smooth Muscle Cells via Activating the AMPK-SIRT1 Signal Pathway.

Oxidative medicine and cellular longevity·2020
Same author

Obesity-induced excess of 17-hydroxyprogesterone promotes hyperglycemia through activation of glucocorticoid receptor.

The Journal of clinical investigation·2020
Same author

Human neutralizing antibodies elicited by SARS-CoV-2 infection.

Nature·2020
Same author

Extensive intracranial arterial dolichoectasia involving distal branches of intracranial arteries: two cases report and review of the literature.

The International journal of neuroscience·2020
Same journal

RETRACTION: An IoMT-Based Approach for Real-Time Monitoring Using Wearable Neuro-Sensors.

Journal of healthcare engineering·2026
Same journal

RETRACTION: Learning to Discriminate Adversarial Examples by Sensitivity Inconsistency in IoHT Systems.

Journal of healthcare engineering·2026
Same journal

RETRACTION: Multi-Chaos-Based Lightweight Image Encryption-Compression for Secure Occupancy Monitoring.

Journal of healthcare engineering·2026
Same journal

RETRACTION: Image Risk Assessment of the Thyroid Cancer Model Based on Discriminant Analysis and the Value of TAP and CEA Combined Detection.

Journal of healthcare engineering·2026
Same journal

RETRACTION: Meta-Analysis of the Prognostic Value of Narcotrend Monitoring of Different Depths of Anesthesia and Different Bispectral Index (BIS) Values for Cognitive Dysfunction after Tumor Surgery in Elderly Patients.

Journal of healthcare engineering·2026
Same journal

Correction to "Representation of Differential Learning Method for Mitosis Detection".

Journal of healthcare engineering·2026
See all related articles

Related Experiment Video

Updated: Feb 20, 2026

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons
07:59

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

2.0K

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Yufei Gao1, Yanjie Zhou2, Bing Zhou3

  • 1College of Information Science and Technology, Beijing Normal University, Beijing, China

Journal of Healthcare Engineering
|October 27, 2017
PubMed
Summary
This summary is machine-generated.

Partition Tuning-based Skew Handling (PTSH) efficiently addresses data skew in big data analytics. This algorithm improves MapReduce performance, especially for healthcare data mining, reducing analysis time.

More Related Videos

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'
10:31

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

Published on: February 10, 2017

11.6K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.1K

Related Experiment Videos

Last Updated: Feb 20, 2026

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons
07:59

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

2.0K
A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'
10:31

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

Published on: February 10, 2017

11.6K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.1K

Area of Science:

  • Big data analytics
  • Healthcare informatics
  • Distributed computing

Background:

  • The healthcare industry generates vast datasets, necessitating efficient big data analytics.
  • The MapReduce programming model is widely used but suffers from data skew, impacting performance.
  • Existing solutions for data skew in MapReduce have limitations.

Purpose of the Study:

  • To introduce and evaluate the Partition Tuning-based Skew Handling (PTSH) algorithm for mitigating data skew in MapReduce.
  • To demonstrate the efficiency and robustness of PTSH using simulated and real-world healthcare datasets.
  • To assess the impact of PTSH on association rule mining (ARM) for healthcare data.

Main Methods:

  • PTSH employs a two-stage partitioning strategy and partition tuning to disperse and recombine key-value pairs.
  • The algorithm was tested against native Hadoop, Closer, and LEEN on diverse datasets.
  • Performance was evaluated based on efficiency in handling data skew and overall job completion time.

Main Results:

  • PTSH effectively handles data skew in MapReduce, outperforming native Hadoop, Closer, and LEEN.
  • The algorithm significantly improves the performance of MapReduce jobs dealing with skewed data.
  • Adoption of PTSH led to a substantial reduction in the time required for association rule extraction on healthcare data.

Conclusions:

  • The PTSH algorithm offers an efficient solution for data skew challenges in MapReduce.
  • PTSH enhances the performance and applicability of MapReduce for big data analytics in healthcare.
  • PTSH is particularly beneficial for accelerating association rule mining on large healthcare datasets.