Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Collisions in Multiple Dimensions: Introduction

Collisions in Multiple Dimensions: Introduction

It is far more common for collisions to occur in two dimensions; that is, the initial velocity vectors are neither parallel nor antiparallel to each other. Let's see what complications arise from this. The first idea is that momentum is a vector. Like all vectors, it can be expressed as a sum of perpendicular components (usually, though not always, an x-component and a y-component, and a z-component if necessary). Thus, when the statement of conservation of momentum is written for a...

Frequency-dependent Selection

Frequency-dependent Selection

When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A unified framework for selecting and evaluating cell-type-specific gene co-expressions in single-cell data.

Briefings in bioinformatics·2026

Same author

MIXPRS enables multi-population and multi-method polygenic risk scores using summary statistics.

Nature genetics·2026

Same author

Identification of multi-omic pleiotropy factors for peripheral artery disease.

Human molecular genetics·2026

Same author

Multi-ancestry transcriptome-wide association studies uncover insights into breast cancer genetics and biology.

Nature communications·2026

Same author

Loss of Cyclin G-Associated Kinase (Gak) Leads to Lysosome Dysfunction and Immune Modulation in Podocytes.

Journal of the American Society of Nephrology : JASN·2026

Same author

Lineage and organ signals sequentially build organ intrinsic nervous systems.

Nature·2026

Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026

Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026

Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026

Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026

Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026

Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 29, 2025

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Clustering high-dimensional data via feature selection.

Tianqi Liu¹, Yu Lu², Biqing Zhu³

¹Google Research, New York, New York, USA.

|March 26, 2022

Summary

This summary is machine-generated.

We introduce spectral clustering with feature selection (SC-FS), a novel method for high-dimensional data clustering. This approach effectively identifies informative features and improves clustering accuracy for complex datasets.

Keywords:

feature selection high-dimensional data spectral clustering

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Related Experiment Videos

Last Updated: Sep 29, 2025

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Area of Science:

Statistics
Machine Learning
Bioinformatics

Background:

High-dimensional data analysis, including microarray and RNA-seq data, presents significant clustering challenges.
Existing methods may struggle with identifying relevant features in large datasets.

Purpose of the Study:

To propose and evaluate a novel clustering procedure, spectral clustering with feature selection (SC-FS), for high-dimensional data.
To demonstrate the method's ability to identify informative features and achieve optimal clustering error rates.

Main Methods:

Spectral clustering is initially used to estimate data labels.
Features with the highest R-squared values relative to these labels are selected.
A second clustering round is performed using only the selected features.

Main Results:

The SC-FS method is theoretically proven to identify informative features with high probability under mild conditions.
The procedure achieves a minimax optimal clustering error rate for the sparse Gaussian mixture model.
Empirical validation on four real-world datasets confirms the method's effectiveness for high-dimensional data.

Conclusions:

SC-FS offers a robust and effective approach for clustering high-dimensional data.
The feature selection component enhances clustering performance and interpretability.
This method has broad applicability in fields utilizing large-scale biological data.