Enhancing performance of E-Government information systems with SSD-based Hadoop mapreduce
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a novel shuffle mechanism for Hadoop clusters with Solid-State Drives (SSDs), significantly improving e-government data processing speeds. The optimized approach reduces I/O and network delays, enhancing overall system efficiency.
Area Of Science
- Computer Science
- Data Engineering
- E-Government
Background
- E-government systems generate vast, heterogeneous datasets requiring high-throughput, low-latency processing.
- Hadoop MapReduce, while common, faces performance bottlenecks due to disk I/O and network latency in the shuffle phase.
Purpose Of The Study
- To propose and evaluate a data address-based shuffle mechanism for Hadoop clusters utilizing Solid-State Drives (SSDs).
- To enhance data processing performance and efficiency for large-scale e-government applications.
Main Methods
- Implementation of a novel shuffle mechanism featuring address-based sorting, merging, and pre-transmission of intermediate data.
- Performance evaluation using Terasort and Wordcount benchmarks on Hadoop clusters equipped with SSDs.
- Scalability testing on a simulated 50-node cluster and energy consumption profiling.
Main Results
- Achieved execution time reductions of 8% (Terasort) and 1% (Wordcount) with statistically significant 95% confidence intervals.
- Demonstrated improved performance, reduced network congestion, and a 31% decrease in energy consumption compared to Hard Disk Drive (HDD) systems.
- Validated enhanced efficiency and scalability for large-scale data processing.
Conclusions
- The proposed data address-based shuffle mechanism offers a cost-effective and efficient solution for big data processing in public sector computing.
- Optimizing shuffle mechanisms for SSDs is crucial for advancing e-government data processing capabilities.
- The approach effectively mitigates I/O and network overhead, leading to substantial performance gains.
Related Concept Videos
Health Information Technology (HIT)
Health Information Technology, commonly called HIT, integrates advanced information systems and technology in healthcare settings. Its primary functions include:
Documentation and Monitoring of Patient Care: HIT systems facilitate the efficient recording and tracking of patient data, aiding healthcare providers in monitoring patients' health status and making informed decisions.
Managerial and Organizational Functions: Beyond patient care, HIT is...
Geographic Information System (GIS) technology is essential for risk identification, action prioritization, and resource optimization in critical situations like flooding and earthquakes. By integrating spatial and demographic data, GIS provides a comprehensive framework for emergency response.GIS integrates data layers, like rainfall intensity, topography, elevation profiles, and river levels, to model high-risk flood zones. These layers assess areas susceptible to flooding based on their...
Geographic Information Systems (GIS) are tools for storing, analyzing, and displaying spatial data alongside related attributes. Unlike traditional information systems that address general queries, GIS incorporates spatial components, enabling users to answer "where" and "how far." For example, GIS can process housing data linked to geographic locations like zip codes, allowing insights into population density or housing distribution through thematic maps.GIS integrates technologies such as...
Geographic Information Systems (GIS) operate across three levels of application, each representing an increasing degree of complexity: data management, analysis, and prediction. These levels reflect the expanding functionality and versatility of GIS technology in handling spatial data for diverse purposes.Data ManagementAt its foundational level, GIS serves as a tool for data management, enabling the input, storage, retrieval, and organization of spatial data. This level is often employed in...
GIS manipulation and analysis functions are vital for decision-making and planning. These activities range from data retrieval tasks, such as selecting information based on specific criteria, to advanced analytical techniques that address complex spatial problems.One critical GIS analysis method is overlaying, which combines multiple data layers to examine impacts. For example, overlaying a river-dammed lake boundary with road networks can identify affected infrastructure. Another common...
In the past, planning projects such as schools or public facilities required extensive manual effort to gather and compile data. Information such as property boundaries, soil characteristics, road networks, zoning regulations, and flood zones had to be sourced individually from courthouses, utility providers, and registry offices. Assembling these datasets into a coherent format often took several months, delaying project timelines.The introduction of Geographic Information Systems (GIS)...

