Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Single Nucleotide Polymorphisms-SNPs

Single Nucleotide Polymorphisms-SNPs

A single nucleotide polymorphism or SNP is a single nucleotide variation at a specific genomic position in a large population. It is the most prevalent type of sequence variation found in the human genome. Point mutations that occur in more than 1% of the population qualify as SNPs. These are present once every 1000 nucleotides on an average in the human genome. Replacement of a purine with another purine (A/G) or a pyrimidine with another pyrimidine (C/T) is known as a transition. In contrast,...

Comparing Copy Number Variations and SNPs

Comparing Copy Number Variations and SNPs

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...

Modern Molecular Taxonomy

Modern Molecular Taxonomy

Advancements in molecular biology have revolutionized the identification and characterization of bacteria, with multiple methods leveraging DNA sequencing for enhanced precision. As sequencing technologies improve and costs decline, these approaches are increasingly used in clinical, environmental, and evolutionary studies.Multilocus Sequence Typing (MLST) examines several housekeeping genes, essential chromosomal genes encoding cellular functions, to distinguish strains. Approximately...

Applications of Molecular Taxonomy

Applications of Molecular Taxonomy

Molecular taxonomy has revolutionized the understanding and classification of bacteria, providing precise insights into their diversity, evolutionary relationships, and ecological roles. By utilizing molecular techniques such as DNA sequencing and fingerprinting, researchers have made significant strides in various fields related to bacterial studies.Resolving Taxonomic AmbiguitiesMolecular taxonomy has been instrumental in distinguishing closely related bacterial species initially thought to...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Theoretical understanding of interfacial polycondensation reactions - a review.

Soft matter·2026

Same author

On construction of data preprocessing for real-life SoyLeaf dataset & disease identification using Deep Learning Models.

Computational biology and chemistry·2025

Same author

Dual-Nuclide Biodistribution and Therapeutic Evaluation of a Novel Antibody-Based Radiopharmaceutical in Anaplastic Thyroid Cancer Xenografts.

Molecular cancer therapeutics·2025

Same author

Probiotics Show Promise as a Novel Natural Treatment for Neurological Disorders.

Current pharmaceutical biotechnology·2023

Same author

Photoactive immunoconjugates for targeted photodynamic therapy of cancer.

Journal of photochemistry and photobiology. B, Biology·2023

Same author

A Theranostic Small-Molecule Prodrug Conjugate for Neuroendocrine Prostate Cancer.

Pharmaceutics·2023

Same journal

An interpretable framework for cancer drug response prediction using integrated drug and multi-omics data with a hybrid Bi-LSTM-GRU network.

Computational biology and chemistry·2026

Same journal

SegMWB: A lightweight deep learning framework for microscopic image classification.

Computational biology and chemistry·2026

Same journal

Protein dynamic simulations: From early inception to clinical translation over half a century.

Computational biology and chemistry·2026

Same journal

Integrated omics and virtual screening predict Tabularin as a dual inhibitor of the prognostic microRNAs mir-19a and mir-32 in colorectal cancer.

Computational biology and chemistry·2026

Same journal

In silico characterization of acetyl-CoA carboxylase from Staphylococcus aureus and Escherichia coli: A comparative analysis.

Computational biology and chemistry·2026

Same journal

An optimized cascaded transformer with progressive attention for lung and colon cancer diagnosis from histopathological images.

Computational biology and chemistry·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 14, 2025

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Apache Spark based kernelized fuzzy clustering framework for single nucleotide polymorphism sequence analysis.

Preeti Jha¹, Aruna Tiwari¹, Neha Bharill²

¹Indian Institute of Technology Indore, 453552, India.

Computational Biology and Chemistry

|March 8, 2021

Summary

This summary is machine-generated.

This study introduces kernelized fuzzy clustering algorithms for high-dimensional genomics data, improving clustering of Single Nucleotide Polymorphism (SNP) sequences using Apache Spark for faster and more accurate bioinformatics analysis.

Keywords:

Apache Spark High-dimensional Kernelized fuzzy clustering Non-linear SNP sequences

More Related Videos

A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene

A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene

Published on: April 1, 2019

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

Related Experiment Videos

Last Updated: Nov 14, 2025

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene

A Method to Study the C924T Polymorphism of the Thromboxane A2 Receptor Gene

Published on: April 1, 2019

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

Area of Science:

Bioinformatics
Computational Biology
Data Science

Background:

High-dimensional genomics data presents significant clustering challenges for researchers.
Non-linear separable problems require advanced clustering techniques.

Purpose of the Study:

To develop scalable kernelized fuzzy clustering algorithms for high-dimensional genomics data.
To improve the analysis of Single Nucleotide Polymorphism (SNP) sequences.

Main Methods:

Proposed Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) and Kernelized Scalable Literal Fuzzy c-Means (KSLFCM) algorithms.
Utilized Apache Spark framework with Resilient Distributed Dataset (RDD) for localized sub-clustering.
Developed a scalable preprocessing approach for generating numeric feature vectors from SNP sequences.

Main Results:

Demonstrated significant improvements in time and space complexity.
Achieved better Silhouette and Davies-Bouldin index scores compared to existing methods.
Validated effectiveness on real-world SNP datasets from soybean and rice.

Conclusions:

The proposed scalable kernelized fuzzy clustering algorithms effectively address challenges in high-dimensional genomics data analysis.
KSRSIO-FCM and KSLFCM offer efficient and accurate clustering of SNP sequences on the Apache Spark framework.