Enhanced Data Mining and Visualization of Sensory-Graph-Modeled Datasets through Summarization

Affiliations
  • 1School of Computing, National University of Computer and Emerging Science, Islamabad 44000, Pakistan.
  • 2Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
  • 3Department of Computer Science, College of Computer Science and Information System, Najran University, Najran 55461, Saudi Arabia.
  • 4Department of Computer Sciences, Faculty of Computing and Information Technology, Northern Border University, Rafha 91911, Saudi Arabia.
  • 5Faculty of Computing and AI, Air University, E-9, Islamabad 44000, Pakistan.
  • 6Cognitive Systems Lab, University of Bremen, 28359 Bremen, Germany.

Abstract

The acquisition, processing, mining, and visualization of sensory data for knowledge discovery and decision support has recently been a popular area of research and exploration. Its usefulness is paramount because of its relationship to the continuous involvement in the improvement of healthcare and other related disciplines. As a result of this, a huge amount of data have been collected and analyzed. These data are made available for the research community in various shapes and formats; their representation and study in the form of graphs or networks is also an area of research which many scholars are focused on. However, the large size of such graph datasets poses challenges in data mining and visualization. For example, knowledge discovery from the Bio-Mouse-Gene dataset, which has over 43 thousand nodes and 14.5 million edges, is a non-trivial job. In this regard, summarizing the large graphs provided is a useful alternative. Graph summarization aims to provide the efficient analysis of such complex and large-sized data; hence, it is a beneficial approach. During summarization, all the nodes that have similar structural properties are merged together. In doing so, traditional methods often overlook the importance of personalizing the summary, which would be helpful in highlighting certain targeted nodes. Personalized or context-specific scenarios require a more tailored approach for accurately capturing distinct patterns and trends. Hence, the concept of personalized graph summarization aims to acquire a concise depiction of the graph, emphasizing connections that are closer in proximity to a specific set of given target nodes. In this paper, we present a faster algorithm for the personalized graph summarization (PGS) problem, named IPGS; this has been designed to facilitate enhanced and effective data mining and visualization of datasets from various domains, including biosensors. Our objective is to obtain a similar compression ratio as the one provided by the state-of-the-art PGS algorithm, but in a faster manner. To achieve this, we improve the execution time of the current state-of-the-art approach by using weighted, locality-sensitive hashing, through experiments on eight large publicly available datasets. The experiments demonstrate the effectiveness and scalability of IPGS while providing a similar compression ratio to the state-of-the-art approach. In this way, our research contributes to the study and analysis of sensory datasets through the perspective of graph summarization. We have also presented a detailed study on the Bio-Mouse-Gene dataset, which was conducted to investigate the effectiveness of graph summarization in the domain of biosensors.

Related Concept Videos

JoVE Research Video for Review and Preview 01:13

8.1K

Data are individual items of information obtained from a population or sample. Data may be classified as qualitative (categorical), quantitative continuous, or quantitative discrete. Because it is not practical to measure the entire population in a study, researchers use samples to represent the population. A random sample is a representative group from the population chosen by using a method that gives each individual in the population an equal chance of being included in the sample. Random…

JoVE Research Video for Bar Graph 01:07

15.1K

A bar graph is also called a bar chart and consists of bars that are separated from each other. It either uses horizontal or vertical bars to show comparisons among categories. The bars can be rectangles, or they can be rectangular boxes (used in three-dimensional plots). One axis of the graph represents the specific categories being compared, and the other axis shows a discrete value. In this graph, the length of the bar for each category is proportional to the number or percent of individuals…

JoVE Research Video for 5-Number Summary 01:04

3.8K

In a dataset, the 5-number summary includes the minimum data value, the data value of the first quartile, the median data value or data value of the second quartile, the data value of the third quartile, and the maximum data value. These 5 data values can be visualized as a box and whisker plot.
In a box plot, the minimum and maximum data values represent the lower and upper whiskers in the graph, and the median is designated as the center of the box in the chart. The first quartile and third…

JoVE Research Video for Time-Series Graph 00:54

3.8K

A time-series graph is a line graph with repeated measurements taken at successive intervals of time. It is also called a time series chart. To construct a time-series graph, one must look at both pieces of a paired data set. The horizontal axis is used to plot the time increments, and the vertical axis is used to plot the values of the variable that one is measuring. By using the axes in this way, each point on the graph will correspond to time and a measured quantity. The points on the graph…

JoVE Research Video for Statgraphics 01:10

39

Statgraphics is a comprehensive statistical software suite designed for both basic and advanced data analysis. Originating in 1980 at Princeton University under Dr. Neil W. Polhemus, it was one of the pioneering tools for statistical computing on personal computers, with its public release in 1982 marking an early milestone in data science software. Over the years, it has evolved into a robust platform for data science, offering tools for regression analysis, ANOVA, multivariate statistics,…

JoVE Research Video for Biostatistics: Overview 01:20

160

Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are…