Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

13.8K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
13.8K
Parallel Processing01:20

Parallel Processing

482
The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...
482
Optimizing Chromatographic Separations01:15

Optimizing Chromatographic Separations

722
Optimizing chromatographic separations is crucial for obtaining clean separations in a minimum amount of time. Optimization is required for several factors, including kinetic effects related to band broadening, plate height, capacity factor, and separation factor.
Band broadening refers to spreading solute bands as they travel through the column. This broadening can impact resolution. Plate height (H) represents the length required for one theoretical plate. A lower plate height corresponds to...
722
Sampling Plans01:23

Sampling Plans

747
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
747
Gaussian Elimination: Problem Solving01:30

Gaussian Elimination: Problem Solving

90
Systems of linear equations in several variables are pivotal in modeling complex scenarios involving multiple unknowns and constraints. Such systems are widely used in various fields to represent relationships where several conditions must be simultaneously satisfied. Each variable in the system corresponds to an unknown quantity, while each equation imposes a linear constraint, leading to a structured approach for analyzing and solving real-world problems.A system of three equations with three...
90
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.3K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Whole blood epigenomic and transcriptomic characterization identifies vulnerable molecular subtypes of chronic coronary disease.

Nature communications·2026
Same author

Targeting DNA Polymerase Epsilon Induces Tumor Clearance and Activates an NF-κB-Mediated Inflammatory Response in Triple Negative Breast Cancer.

Cancer research·2026
Same author

Costimulatory blockade depletes T peripheral helper, late-activated naïve, and DN2 B cells in rheumatoid arthritis.

medRxiv : the preprint server for health sciences·2026
Same author

Circulating proteomic landscape of lung function.

The European respiratory journal·2026
Same author

Whole-blood transcriptomics differentiates circulating gene expression between coronary artery disease and peripheral artery disease.

Vascular medicine (London, England)·2026
Same author

Pan-cancer analysis reveals context-dependent roles of LINE-1 ORF1p in immune regulation and copy number alterations.

bioRxiv : the preprint server for biology·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Dec 7, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.2K

Hypercluster: a flexible tool for parallelized unsupervised clustering optimization.

Lili Blumenberg1,2, Kelly V Ruggles3,4

  • 1Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY, 10016, USA.

BMC Bioinformatics
|September 30, 2020
PubMed
Summary
This summary is machine-generated.

Hypercluster streamlines unsupervised clustering for large biological datasets by enabling efficient evaluation of multiple models and hyperparameters, enhancing reproducibility and reducing bias in biological data analysis.

Keywords:
Hyperparameter optimizationMachine learningPythonScikit-learnSnakeMakeUnsupervised clustering

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.8K
Spatial Separation of Molecular Conformers and Clusters
10:37

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

11.4K

Related Experiment Videos

Last Updated: Dec 7, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.2K
ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.8K
Spatial Separation of Molecular Conformers and Clusters
10:37

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

11.4K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Science

Background:

  • Unsupervised clustering is vital for analyzing large biological datasets.
  • Algorithm and hyperparameter choices can introduce bias in clustering results.
  • Evaluating multiple clustering models is time-consuming and complex.

Purpose of the Study:

  • To introduce hypercluster, a Python package and SnakeMake pipeline.
  • To facilitate flexible and parallelized evaluation and selection of clustering models.

Main Methods:

  • Developed a Python package and SnakeMake pipeline for clustering analysis.
  • Enabled efficient evaluation of numerous clustering models and hyperparameters.
  • Facilitated parallelized computation for faster results.

Main Results:

  • Users can explore a wide array of clustering outcomes.
  • Identification of optimal clustering models is made more efficient.
  • The package supports flexible and parallelized clustering evaluation.

Conclusions:

  • Hypercluster enhances usability, robustness, and reproducibility in high-throughput biological studies.
  • Simplifies the process of unsupervised clustering for complex biological data.
  • Provides accessible installation via pip and bioconda with comprehensive documentation.