Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

12.9K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
12.9K
Outliers and Influential Points01:08

Outliers and Influential Points

4.4K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
4.4K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.3K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.3K
Collisions in Multiple Dimensions: Introduction01:05

Collisions in Multiple Dimensions: Introduction

5.7K
It is far more common for collisions to occur in two dimensions; that is, the initial velocity vectors are neither parallel nor antiparallel to each other. Let's see what complications arise from this. The first idea is that momentum is a vector. Like all vectors, it can be expressed as a sum of perpendicular components (usually, though not always, an x-component and a y-component, and a z-component if necessary). Thus, when the statement of conservation of momentum is written for a...
5.7K
Frequency-dependent Selection01:21

Frequency-dependent Selection

22.3K
When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.
22.3K
Collisions in Multiple Dimensions: Problem Solving01:06

Collisions in Multiple Dimensions: Problem Solving

4.4K
In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...
4.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A unified framework for selecting and evaluating cell-type-specific gene co-expressions in single-cell data.

Briefings in bioinformatics·2026
Same author

MIXPRS enables multi-population and multi-method polygenic risk scores using summary statistics.

Nature genetics·2026
Same author

Identification of multi-omic pleiotropy factors for peripheral artery disease.

Human molecular genetics·2026
Same author

Multi-ancestry transcriptome-wide association studies uncover insights into breast cancer genetics and biology.

Nature communications·2026
Same author

Loss of Cyclin G-Associated Kinase (Gak) Leads to Lysosome Dysfunction and Immune Modulation in Podocytes.

Journal of the American Society of Nephrology : JASN·2026
Same author

Lineage and organ signals sequentially build organ intrinsic nervous systems.

Nature·2026
Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026
Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026
Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026
Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026
Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026
Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026
See all related articles

Related Experiment Video

Updated: Sep 29, 2025

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.5K

Clustering high-dimensional data via feature selection.

Tianqi Liu1, Yu Lu2, Biqing Zhu3

  • 1Google Research, New York, New York, USA.

Biometrics
|March 26, 2022
PubMed
Summary
This summary is machine-generated.

We introduce spectral clustering with feature selection (SC-FS), a novel method for high-dimensional data clustering. This approach effectively identifies informative features and improves clustering accuracy for complex datasets.

Keywords:
feature selectionhigh-dimensional dataspectral clustering

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K
Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K

Related Experiment Videos

Last Updated: Sep 29, 2025

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.5K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K
Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K

Area of Science:

  • Statistics
  • Machine Learning
  • Bioinformatics

Background:

  • High-dimensional data analysis, including microarray and RNA-seq data, presents significant clustering challenges.
  • Existing methods may struggle with identifying relevant features in large datasets.

Purpose of the Study:

  • To propose and evaluate a novel clustering procedure, spectral clustering with feature selection (SC-FS), for high-dimensional data.
  • To demonstrate the method's ability to identify informative features and achieve optimal clustering error rates.

Main Methods:

  • Spectral clustering is initially used to estimate data labels.
  • Features with the highest R-squared values relative to these labels are selected.
  • A second clustering round is performed using only the selected features.

Main Results:

  • The SC-FS method is theoretically proven to identify informative features with high probability under mild conditions.
  • The procedure achieves a minimax optimal clustering error rate for the sparse Gaussian mixture model.
  • Empirical validation on four real-world datasets confirms the method's effectiveness for high-dimensional data.

Conclusions:

  • SC-FS offers a robust and effective approach for clustering high-dimensional data.
  • The feature selection component enhances clustering performance and interpretability.
  • This method has broad applicability in fields utilizing large-scale biological data.