Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Statistical Methods for Analyzing Epidemiological Data

Statistical Methods for Analyzing Epidemiological Data

Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:

Study Design in Statistics

Study Design in Statistics

A study design is a set of techniques that allow a researcher to collect and analyze data from different variables defined for a specific research problem. Statistics is commonly for effective study design and more robust experiments,
Does aspirin reduce the risk of heart attacks? Is one brand of fertilizer more effective at growing roses than another? Is fatigue as dangerous to a driver as the influence of alcohol? Questions like these are answered using randomized experiments with proper...

Clinical Trials

Clinical Trials

Clinical trials are prospective experimental studies conducted on humans to determine the safety and efficacy of treatments, drugs, diet methods, and medical devices. Using statistics in clinical trials enables researchers to derive reasonable and accurate conclusions from the collected data, allowing them to make wise decisions in uncertain situations. In medical research, statistical methods are crucial for preventing errors and bias.
There are four phases in a clinical trial. A phase one...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

WayFindR: investigating feedback in biological pathways.

NAR genomics and bioinformatics·2026

Same author

Clustering Digestive Tract Tumors Using Transcriptomic and Mutation Data.

Cancers·2026

Same author

The rumination severity index: Development and evaluation of a scoring tool for rumination syndrome.

Journal of pediatric gastroenterology and nutrition·2026

Same author

Improving Power of the Win Ratio Analysis through Distance-based Weights.

Statistics in medicine·2026

Same author

Safety and efficacy of droxidopa for dysautonomia in adults with Menkes disease and occipital horn syndrome in the USA: a randomised phase 1/2a crossover trial.

EClinicalMedicine·2026

Same author

An AI-based chatbot to support health-related social needs among pediatric primary care population: Protocol for a pilot randomized controlled trial.

PloS one·2026

Same journal

Evaluation of temporal preservation in synthetic longitudinal patient data.

Journal of biomedical informatics·2026

Same journal

ARKE: An ontology-driven framework for automated mapping of local radiology procedure terms to the LOINC-RadLex playbook using large language model.

Journal of biomedical informatics·2026

Same journal

A validation-driven training controller for cross-lingual biomedical NER via reinforcement learning-based adaptive loss weighting.

Journal of biomedical informatics·2026

Same journal

ASP-HR: An Adaptive Spatial Perception and Hierarchical Reasoning mechanism for document-level biomedical relation extraction.

Journal of biomedical informatics·2026

Same journal

Beyond Accuracy: Safety-Centered guidelines for the evaluation of LLM-based therapy recommendation systems for chronic multimorbidity patients.

Journal of biomedical informatics·2026

Same journal

DeepEN: A deep reinforcement learning framework for personalized enteral nutrition in critical care.

Journal of biomedical informatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 9, 2025

Author Spotlight: Evaluating Clinicians' Adoption of Ultrasound-Guided Vascular Cannulation Through Simulation Training

Author Spotlight: Evaluating Clinicians' Adoption of Ultrasound-Guided Vascular Cannulation Through Simulation Training

Published on: August 9, 2024

Simulation-derived best practices for clustering clinical data.

Caitlin E Coombes¹, Xin Liu², Zachary B Abrams³

¹The Ohio State University College of Medicine, 370 W 9th Ave, Columbus, OH 43210, USA.

Journal of Biomedical Informatics

|April 16, 2021

Summary

This summary is machine-generated.

Choosing the right distance metric is crucial for accurate patient clustering in clinical data. The DAISY metric with hierarchical clustering (HC) effectively identifies distinct patient groups, improving disease understanding and precision medicine.

Keywords:

Clinical informatics Clinical trial Clustering Electronic health record Unsupervised machine learning

More Related Videos

In Silico Clinical Trials for Cardiovascular Disease

In Silico Clinical Trials for Cardiovascular Disease

Published on: May 27, 2022

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

Related Experiment Videos

Last Updated: Nov 9, 2025

Author Spotlight: Evaluating Clinicians' Adoption of Ultrasound-Guided Vascular Cannulation Through Simulation Training

Author Spotlight: Evaluating Clinicians' Adoption of Ultrasound-Guided Vascular Cannulation Through Simulation Training

Published on: August 9, 2024

In Silico Clinical Trials for Cardiovascular Disease

In Silico Clinical Trials for Cardiovascular Disease

Published on: May 27, 2022

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

Area of Science:

Clinical informatics
Data science in healthcare
Biostatistics

Background:

Clustering analyses are vital for understanding patient phenotypes and disease trajectories in clinical medicine.
Ensuring rigor, validity, and reproducibility in clinical clustering solutions is an ongoing challenge.
Best practices for dissimilarity matrix calculation and clustering on mixed-type clinical data require evaluation.

Purpose of the Study:

To evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type clinical data.
To compare the performance of various distance metrics and clustering algorithms on simulated and real-world clinical datasets.
To identify optimal methods for enhancing patient subclassification and precision medicine.

Main Methods:

Simulated clinical data (binary, continuous, categorical, and mixtures) were used to test 5 single distance metrics and 3 mixed distance metrics.
Clustering was performed using hierarchical clustering (HC), k-medoids, and self-organizing maps (SOM).
Performance was validated using Adjusted Rand Index (ARI) and silhouette width (SW) on simulated and two real-world datasets (chronic lymphocytic leukemia and intensive care unit admissions).

Main Results:

Hierarchical clustering (HC) demonstrated superior performance over k-medoids and SOM, evidenced by higher ARI across data types.
The DAISY mixed-type distance metric yielded the highest mean ARI for most mixed data types.
DAISY combined with HC identified superior, separable clusters in both real-world clinical datasets.

Conclusions:

The selection of appropriate mixed-type distance metrics is essential for optimal patient cluster separation and data utilization.
Advanced metrics capable of handling multiple data types enhance the subclassification of diseases.
Improved disease subclassification facilitates targeted treatments, precision medicine, clinical decision support, and better patient outcomes.