Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
What is Population Genetics?01:25

What is Population Genetics?

A population is composed of members of the same species that simultaneously live and interact in the same area. When individuals in a population breed, they pass down their genes to their offspring. Many of these genes are polymorphic, meaning that they occur in multiple variants. Such variations of a gene are referred to as alleles. The collective set of all the alleles within a population is known as the gene pool.While some alleles of a given gene might be observed commonly, other variants...
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
Sampling Plans01:23

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
Genetic Drift03:33

Genetic Drift

Natural selection—probably the most well-known evolutionary mechanism—increases the prevalence of traits that enhance survival and reproduction. However, evolution does not merely propagate favorable traits, nor does it always benefit populations.Life is not fair. A deer grazing contentedly in a field can have her meal cut tragically short by a bolt of lightning. If the doomed doe is one of only three in the population, 1/3 of the population’s gene pool is lost. Random events like this can...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Associations between controlling nutritional status and mortality in osteoporosis: evidence from NHANES, 2005-2018.

Calcified tissue international·2026
Same author

Femoral Osteochondritis Dissecans and Tibial Osteochondral Defect in an Adult Revealed by Bone SPECT/CT.

Diagnostics (Basel, Switzerland)·2026
Same author

Teenage Girl With a Painful Ankle.

Annals of emergency medicine·2023
Same author

Automatic subject-specific spatiotemporal feature selection for subject-independent affective BCI.

PloS one·2021
Same author

Isolated cerebral fat embolism syndrome: an extremely rare complication in orthopaedic patients.

ANZ journal of surgery·2021
Same author

LFastqC: A lossless non-reference-based FASTQ compressor.

PloS one·2019
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Jun 25, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

PCA-based population structure inference with generic clustering algorithms.

Chih Lee1, Ali Abdool, Chun-Hsi Huang

  • 1Computer Science and Engineering Department, University of Connecticut, Storrs, CT 06269, USA. chih.lee@uconn.edu

BMC Bioinformatics
|February 12, 2009
PubMed
Summary
This summary is machine-generated.

We developed a fast and scalable principal component analysis (PCA) method for population structure inference from genotype data. This approach rivals established algorithms in accuracy and is significantly more efficient for large datasets.

More Related Videos

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Related Experiment Videos

Last Updated: Jun 25, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Area of Science:

  • Genetics
  • Bioinformatics
  • Computational Biology

Background:

  • Population structure inference from large genotype datasets is computationally intensive.
  • Existing methods like STRUCTURE, while powerful, are time-consuming due to Markov Chain Monte Carlo (MCMC) parameter estimation.
  • Efficiently analyzing genetic variation across numerous loci is crucial for understanding population dynamics.

Purpose of the Study:

  • To introduce a computationally efficient principal component analysis (PCA) based method for inferring population structure.
  • To evaluate the performance of PCA combined with clustering algorithms against established methods.
  • To identify optimal methods for predicting the number of subpopulations.

Main Methods:

  • Applying PCA to high-density genotype data.
  • Selecting significant principal components using the Tracy-Widom distribution.
  • Utilizing clustering algorithms (K-means, soft K-means, spectral clustering) for subpopulation assignment.
  • Comparing performance against the STRUCTURE algorithm.
  • Investigating Bayesian Information Criterion (BIC) and likelihood for predicting subpopulation numbers.

Main Results:

  • The proposed PCA-based approach demonstrates comparable performance to STRUCTURE in population structure inference.
  • Soft K-means with BIC accurately predicted the number of subpopulations on simulated datasets, matching STRUCTURE's predictions.
  • BIC proved to be a more effective index than likelihood for predicting subpopulation numbers in real datasets.

Conclusions:

  • The PCA-based method offers a fast and scalable alternative for population structure inference.
  • The choice of algorithm should be guided by specific application requirements and computational constraints.
  • This approach significantly reduces the time required for analyzing large-scale genetic data.