Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

Optimizing Chromatographic Separations

Optimizing Chromatographic Separations

Optimizing chromatographic separations is crucial for obtaining clean separations in a minimum amount of time. Optimization is required for several factors, including kinetic effects related to band broadening, plate height, capacity factor, and separation factor.
Band broadening refers to spreading solute bands as they travel through the column. This broadening can impact resolution. Plate height (H) represents the length required for one theoretical plate. A lower plate height corresponds to...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Gaussian Elimination: Problem Solving

Gaussian Elimination: Problem Solving

Systems of linear equations in several variables are pivotal in modeling complex scenarios involving multiple unknowns and constraints. Such systems are widely used in various fields to represent relationships where several conditions must be simultaneously satisfied. Each variable in the system corresponds to an unknown quantity, while each equation imposes a linear constraint, leading to a structured approach for analyzing and solving real-world problems.A system of three equations with three...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Whole blood epigenomic and transcriptomic characterization identifies vulnerable molecular subtypes of chronic coronary disease.

Nature communications·2026

Same author

Targeting DNA Polymerase Epsilon Induces Tumor Clearance and Activates an NF-κB-Mediated Inflammatory Response in Triple Negative Breast Cancer.

Cancer research·2026

Same author

Costimulatory blockade depletes T peripheral helper, late-activated naïve, and DN2 B cells in rheumatoid arthritis.

medRxiv : the preprint server for health sciences·2026

Same author

Circulating proteomic landscape of lung function.

The European respiratory journal·2026

Same author

Whole-blood transcriptomics differentiates circulating gene expression between coronary artery disease and peripheral artery disease.

Vascular medicine (London, England)·2026

Same author

Pan-cancer analysis reveals context-dependent roles of LINE-1 ORF1p in immune regulation and copy number alterations.

bioRxiv : the preprint server for biology·2025

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 7, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Hypercluster: a flexible tool for parallelized unsupervised clustering optimization.

Lili Blumenberg^1,2, Kelly V Ruggles^3,4

¹Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY, 10016, USA.

BMC Bioinformatics

|September 30, 2020

Summary

This summary is machine-generated.

Hypercluster streamlines unsupervised clustering for large biological datasets by enabling efficient evaluation of multiple models and hyperparameters, enhancing reproducibility and reducing bias in biological data analysis.

Keywords:

Hyperparameter optimization Machine learning Python Scikit-learn SnakeMake Unsupervised clustering

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Spatial Separation of Molecular Conformers and Clusters

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

Related Experiment Videos

Last Updated: Dec 7, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Spatial Separation of Molecular Conformers and Clusters

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

Area of Science:

Bioinformatics
Computational Biology
Data Science

Background:

Unsupervised clustering is vital for analyzing large biological datasets.
Algorithm and hyperparameter choices can introduce bias in clustering results.
Evaluating multiple clustering models is time-consuming and complex.

Purpose of the Study:

To introduce hypercluster, a Python package and SnakeMake pipeline.
To facilitate flexible and parallelized evaluation and selection of clustering models.

Main Methods:

Developed a Python package and SnakeMake pipeline for clustering analysis.
Enabled efficient evaluation of numerous clustering models and hyperparameters.
Facilitated parallelized computation for faster results.

Main Results:

Users can explore a wide array of clustering outcomes.
Identification of optimal clustering models is made more efficient.
The package supports flexible and parallelized clustering evaluation.

Conclusions:

Hypercluster enhances usability, robustness, and reproducibility in high-throughput biological studies.
Simplifies the process of unsupervised clustering for complex biological data.
Provides accessible installation via pip and bioconda with comprehensive documentation.