An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining | JoVE Visualize

Area of Science:

Data Mining and Machine Learning
Database Systems
Artificial Intelligence

Background:

Clustering algorithms for multi-database mining (MDM) often struggle with indecisiveness when pairwise database similarities are near the mean.
This indecisiveness leads to trivial clustering results, such as all databases in one cluster or individual singleton clusters.
Existing gradient-based clustering methods can be sensitive to learning rates and require numerous iterations for convergence.

Purpose of the Study:

To develop a learning algorithm that reduces the fuzziness of the similarity matrix in MDM.
To improve the certainty and accuracy of clustering algorithms in identifying optimal database clusters.
To propose a learning-rate-free algorithm for efficient candidate clustering assessment.

Main Methods:

A learning algorithm minimizes a weighted binary entropy loss function using gradient descent and back-propagation to reduce similarity matrix fuzziness.
A learning-rate-free algorithm utilizing coordinate descent (CD) and back-propagation is proposed for efficient clustering.
A max-heap data structure is employed within the CD algorithm to optimize variable selection and minimize a convex clustering quality measure L(θ) in fewer than (n^2-n)/2 iterations.

Main Results:

The proposed learning algorithm successfully reduces similarity matrix fuzziness, leading to improved clustering certainty and identification of optimal database clusters.
The learning-rate-free CD algorithm converges in fewer upper-bounded iterations compared to traditional gradient-based methods.
Experimental results demonstrate that the novel algorithm outperforms existing clustering algorithms for MDM.

Conclusions:

The developed learning algorithm effectively addresses the indecisiveness issue in MDM clustering by enhancing similarity matrix clarity.
The learning-rate-free approach offers a more efficient and robust method for database clustering, reducing computational complexity.
This research provides a significant advancement in MDM, offering improved accuracy and performance for database partitioning.