Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Single Nucleotide Polymorphisms-SNPs01:05

Single Nucleotide Polymorphisms-SNPs

17.3K
A single nucleotide polymorphism or SNP is a single nucleotide variation at a specific genomic position in a large population. It is the most prevalent type of sequence variation found in the human genome. Point mutations that occur in more than 1% of the population qualify as SNPs. These are present once every 1000 nucleotides on an average in the human genome. Replacement of a purine with another purine (A/G) or a pyrimidine with another pyrimidine (C/T) is known as a transition. In contrast,...
17.3K
Comparing Copy Number Variations and SNPs02:26

Comparing Copy Number Variations and SNPs

18.3K
Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...
18.3K
Modern Molecular Taxonomy01:29

Modern Molecular Taxonomy

354
Advancements in molecular biology have revolutionized the identification and characterization of bacteria, with multiple methods leveraging DNA sequencing for enhanced precision. As sequencing technologies improve and costs decline, these approaches are increasingly used in clinical, environmental, and evolutionary studies.Multilocus Sequence Typing (MLST) examines several housekeeping genes, essential chromosomal genes encoding cellular functions, to distinguish strains. Approximately...
354
Applications of Molecular Taxonomy01:20

Applications of Molecular Taxonomy

299
Molecular taxonomy has revolutionized the understanding and classification of bacteria, providing precise insights into their diversity, evolutionary relationships, and ecological roles. By utilizing molecular techniques such as DNA sequencing and fingerprinting, researchers have made significant strides in various fields related to bacterial studies.Resolving Taxonomic AmbiguitiesMolecular taxonomy has been instrumental in distinguishing closely related bacterial species initially thought to...
299
Cluster Sampling Method01:20

Cluster Sampling Method

13.7K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
13.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Theoretical understanding of interfacial polycondensation reactions - a review.

Soft matter·2026
Same author

On construction of data preprocessing for real-life SoyLeaf dataset & disease identification using Deep Learning Models.

Computational biology and chemistry·2025
Same author

Dual-Nuclide Biodistribution and Therapeutic Evaluation of a Novel Antibody-Based Radiopharmaceutical in Anaplastic Thyroid Cancer Xenografts.

Molecular cancer therapeutics·2025
Same author

Probiotics Show Promise as a Novel Natural Treatment for Neurological Disorders.

Current pharmaceutical biotechnology·2023
Same author

Photoactive immunoconjugates for targeted photodynamic therapy of cancer.

Journal of photochemistry and photobiology. B, Biology·2023
Same author

A Theranostic Small-Molecule Prodrug Conjugate for Neuroendocrine Prostate Cancer.

Pharmaceutics·2023
Same journal

An interpretable framework for cancer drug response prediction using integrated drug and multi-omics data with a hybrid Bi-LSTM-GRU network.

Computational biology and chemistry·2026
Same journal

SegMWB: A lightweight deep learning framework for microscopic image classification.

Computational biology and chemistry·2026
Same journal

Protein dynamic simulations: From early inception to clinical translation over half a century.

Computational biology and chemistry·2026
Same journal

Integrated omics and virtual screening predict Tabularin as a dual inhibitor of the prognostic microRNAs mir-19a and mir-32 in colorectal cancer.

Computational biology and chemistry·2026
Same journal

In silico characterization of acetyl-CoA carboxylase from Staphylococcus aureus and Escherichia coli: A comparative analysis.

Computational biology and chemistry·2026
Same journal

An optimized cascaded transformer with progressive attention for lung and colon cancer diagnosis from histopathological images.

Computational biology and chemistry·2026
See all related articles

Related Experiment Video

Updated: Nov 14, 2025

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.5K

Apache Spark based kernelized fuzzy clustering framework for single nucleotide polymorphism sequence analysis.

Preeti Jha1, Aruna Tiwari1, Neha Bharill2

  • 1Indian Institute of Technology Indore, 453552, India.

Computational Biology and Chemistry
|March 8, 2021
PubMed
Summary
This summary is machine-generated.

This study introduces kernelized fuzzy clustering algorithms for high-dimensional genomics data, improving clustering of Single Nucleotide Polymorphism (SNP) sequences using Apache Spark for faster and more accurate bioinformatics analysis.

Keywords:
Apache SparkHigh-dimensionalKernelized fuzzy clusteringNon-linearSNP sequences

More Related Videos

A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene
07:00

A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene

Published on: April 1, 2019

10.2K
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.5K

Related Experiment Videos

Last Updated: Nov 14, 2025

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.5K
A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene
07:00

A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene

Published on: April 1, 2019

10.2K
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.5K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Science

Background:

  • High-dimensional genomics data presents significant clustering challenges for researchers.
  • Non-linear separable problems require advanced clustering techniques.

Purpose of the Study:

  • To develop scalable kernelized fuzzy clustering algorithms for high-dimensional genomics data.
  • To improve the analysis of Single Nucleotide Polymorphism (SNP) sequences.

Main Methods:

  • Proposed Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) and Kernelized Scalable Literal Fuzzy c-Means (KSLFCM) algorithms.
  • Utilized Apache Spark framework with Resilient Distributed Dataset (RDD) for localized sub-clustering.
  • Developed a scalable preprocessing approach for generating numeric feature vectors from SNP sequences.

Main Results:

  • Demonstrated significant improvements in time and space complexity.
  • Achieved better Silhouette and Davies-Bouldin index scores compared to existing methods.
  • Validated effectiveness on real-world SNP datasets from soybean and rice.

Conclusions:

  • The proposed scalable kernelized fuzzy clustering algorithms effectively address challenges in high-dimensional genomics data analysis.
  • KSRSIO-FCM and KSLFCM offer efficient and accurate clustering of SNP sequences on the Apache Spark framework.