Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...

Introduction to Nonparametric Statistics

Introduction to Nonparametric Statistics

Nonparametric statistics offer a powerful alternative to traditional parametric methods, useful when assumptions about the population distribution cannot be made. Unlike parametric tests, which require data to follow a specific distribution with well-defined parameters (such as the mean and standard deviation), nonparametric tests do not require such constraints. This makes them particularly valuable when dealing with small sample sizes, skewed data, or ordinal and categorical variables.
One of...

Kruskal-Wallis Test

Kruskal-Wallis Test

The Kruskal-Wallis test, also known as the Kruskal-Wallis H test, serves as a nonparametric alternative to the one-way ANOVA, offering a solution for analyzing the differences across three or more independent groups based on a single, ordinal-dependent variable. This statistical test is particularly valuable in scenarios where the data does not meet the normal distribution assumption required by its parametric counterparts. Kruskal-Wallis test is designed typically to handle ordinal data or...

Crystallographic Point Groups

Crystallographic Point Groups

Crystallographic point groups represent the various symmetry operations that can occur within crystals. They are unique in that at least one point will always remain unchanged during these actions. For instance, consider the triclinic system. This system, devoid of any axis or plane of symmetry, aligns with the C1 and Ci point groups.where Cᵢ is characterized solely by a center of inversion.Contrastingly, the monoclinic system introduces an element of symmetry. This system with one plane and...

Intrinsically Disordered Proteins

Intrinsically Disordered Proteins

Intrinsically disordered proteins are a group of proteins that do not fold into specific three-dimensional structures. Their structural flexibility allows them to complement ordered proteins to perform functions that are inaccessible to rigid structures. They are more common in eukaryotes than prokaryotes and may either be exclusively intrinsically disordered or hybrid proteins, consisting of a mix of ordered and disordered regions. The absence of a rigid structure in these proteins can be...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Physics-Based Solubility Prediction for Organic Molecules.

Chemical reviews·2025

Same author

Explainable AI Model Reveals Informative Mutational Signatures for Cancer-Type Classification.

Cancers·2025

Same author

Quantifying Soil Microbiome Abundance by Metatranscriptomics and Complementary Molecular Techniques-Cross-Validation and Perspectives.

Molecular ecology resources·2025

Same author

Revisiting the Application of Machine Learning Approaches in Predicting Aqueous Solubility.

ACS omega·2024

Same author

Robust identification of interactions between heat-stress responsive genes in the chicken brain using Bayesian networks and augmented expression data.

Scientific reports·2024

Same author

Allosteric activation unveils protein-mass modulation of ATP phosphoribosyltransferase product release.

Communications chemistry·2024

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 10, 2026

JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics

JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics

Published on: October 19, 2021

PFClust: a novel parameter free clustering algorithm.

Lazaros Mavridis¹, Neetika Nath, John B O Mitchell

¹Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST, Scotland, UK. lazaros.mavridis.lm@gmail.com

BMC Bioinformatics

|July 4, 2013

Summary

This summary is machine-generated.

Parameter Free Clustering (PFClust) automatically identifies the optimal number of clusters in data without user input. This novel algorithm demonstrates superior performance on synthetic and real-world biological datasets.

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Related Experiment Videos

Last Updated: May 10, 2026

JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics

JUMPn: A Streamlined Application for Protein Co-Expression Clustering and Network Analysis in Proteomics

Published on: October 19, 2021

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Area of Science:

Data Science
Bioinformatics
Computational Biology

Background:

Automated data clustering and determination of the optimal number of clusters remain significant challenges.
Existing clustering methodologies often require user-defined parameters, limiting their applicability.
PFClust is introduced as a parameter-free algorithm for automated data clustering.

Purpose of the Study:

To develop and validate an automated clustering algorithm that does not require user-specified parameters.
To assess the performance of PFClust against established clustering methods.
To evaluate PFClust's ability to cluster complex biological data.

Main Methods:

PFClust partitions datasets into clusters based on shared attributes like minimum expectation value and intra-cluster similarity variance.
The algorithm was tested on synthetic 2D vector datasets and real-world protein domain structures from the CATH database.
Performance was compared against six other leading clustering methodologies.

Main Results:

PFClust demonstrated clustering performance at least equal to, and on average slightly better than, six other leading methods.
Five of the compared methods required pre-specified cluster numbers, unlike PFClust.
PFClust showed excellent agreement with known classifications for protein domain structures.

Conclusions:

PFClust successfully automates data clustering and optimal cluster number identification without parameter input.
The algorithm generates meaningful clusters for synthetic data and accurately classifies protein domain structures.
PFClust offers a robust and user-independent solution for data clustering challenges.