Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

11.6K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
11.6K
One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation01:24

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

385
This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...
385
Random Sampling Method01:09

Random Sampling Method

11.0K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
11.0K
Maxwell-Boltzmann Distribution: Problem Solving01:20

Maxwell-Boltzmann Distribution: Problem Solving

1.4K
Individual molecules in a gas move in random directions, but a gas containing numerous molecules has a predictable distribution of molecular speeds, which is known as the Maxwell-Boltzmann distribution, f(v).
This distribution function f(v) is defined by saying that the expected number N (v1,v2) of particles with speeds between v1 and v2 is given by
1.4K
Distributed Loads: Problem Solving01:21

Distributed Loads: Problem Solving

623
Beams are structural elements commonly employed in engineering applications requiring different load-carrying capacities. The first step in analyzing a beam under a distributed load is to simplify the problem by dividing the load into smaller regions, which allows one to consider each region separately and calculate the magnitude of the equivalent resultant load acting on each portion of the beam. The magnitude of the equivalent resultant load for each region can be determined by calculating...
623
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

3.2K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
3.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Echocardiographic prediction of functional coronary stenosis: global longitudinal strain as a key determinant of quantitative flow ratio.

Internal and emergency medicine·2026
Same author

SLC25A21 promotes ferroptosis by inducing mitochondrial GPX4 deficiency in colorectal cancer.

Cellular and molecular life sciences : CMLS·2026
Same author

A rapidly personalized in-hospital bloodstream infection prediction model: a multicenter retrospective study.

BMC infectious diseases·2026
Same author

Microstructured electrode coupled with electrochemical deposition enrichment laser-induced breakdown spectroscopy for ppb-level sensitive detection of Pb<sup>2+</sup> and Cr<sup>3+</sup> in water.

Talanta·2026
Same author

A novel serum phosphorus to chloride and bicarbonate ratio predicts severe acute kidney injury in critically ill patients: a multicenter cohort study.

Respiratory medicine·2026
Same author

Epigenetic and O-glycosylation regulation of p66Shc mitigates mitochondrial oxidative stress in aortic dissection.

Theranostics·2026
Same journal

Thymidylate synthase inhibitory drugs induce p53-dependent pathways differently.

PloS one·2026
Same journal

Top-down and bottom-up attention for joint pattern classification and reconstruction.

PloS one·2026
Same journal

Short- and long-term scaling behavior of blood pressure and pulse arrival time during sleep in healthy controls and patients with obstructive sleep apnea.

PloS one·2026
Same journal

Double DQN-based secrecy energy efficiency and fairness performance in IRS-assisted NOMA systems with friendly jamming.

PloS one·2026
Same journal

10 recommendations for strengthening citizen science for improved societal and ecological outcomes: A co-produced analysis of challenges and opportunities in the 21st century.

PloS one·2026
Same journal

Paying in public: Peer effects, impression management, and willingness to pay on digital payment platforms.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Jun 4, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

6.9K

Distributed K-Means algorithm based on a Spark optimization sample.

Yongan Feng1, Jiapeng Zou1, Wanjun Liu1

  • 1Liaoning Technical University, Huludao, China.

Plos One
|December 23, 2024
PubMed
Summary
This summary is machine-generated.

We developed SOSK-Means, an optimized K-Means algorithm for big data. It significantly boosts computational speed and accuracy for large-scale clustering tasks.

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.4K
Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ
08:59

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

8.1K

Related Experiment Videos

Last Updated: Jun 4, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

6.9K
ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.4K
Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ
08:59

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

8.1K

Area of Science:

  • Data Science
  • Machine Learning
  • Big Data Analytics

Background:

  • Classical K-Means algorithm suffers from instability and performance issues with massive datasets.
  • Efficient clustering of large-scale data is crucial for various data mining applications.

Purpose of the Study:

  • To introduce SOSK-Means, an enhanced K-Means algorithm optimized for Spark to address the limitations of classical K-Means on massive datasets.
  • To improve the computational speed and accuracy of K-Means clustering for large-scale data.

Main Methods:

  • Implemented a weighted jump-bank approach for efficient random sampling and pre-clustering, improving initial center selection.
  • Utilized a weighted max-min distance with variance for enhanced distance calculation, considering data weight and variance.
  • Employed a novel distance comparison method and a Directed Acyclic Graph (DAG) for optimized computation and distributed processing on Spark.

Main Results:

  • SOSK-Means demonstrates significant improvements in computational speed compared to classical K-Means.
  • The algorithm maintains high computational accuracy, effectively handling massive datasets.
  • Enhanced initial center selection and distance calculation contribute to improved clustering performance.

Conclusions:

  • SOSK-Means offers a robust and efficient solution for large-scale data clustering using Spark optimization.
  • The proposed modifications effectively address the instability and performance bottlenecks of traditional K-Means.
  • This optimized algorithm is well-suited for big data analytics requiring fast and accurate clustering.