Performance evaluation of GPU-based parallel sorting algorithms

  • 0Department of Computer Science, SDU University, Kaskelen, Kazakhstan.

|

|

Summary

This summary is machine-generated.

GPU acceleration significantly speeds up sorting algorithms like Radix, Quick, and Merge Sort, achieving over 100x speedups for large datasets. This research demonstrates substantial performance gains compared to sequential methods.

Area Of Science

  • Computer Science
  • High-Performance Computing

Background

  • Sequential sorting methods struggle with large datasets.
  • Parallel sorting and GPU computing offer significant speedup potential.

Purpose Of The Study

  • Investigate GPU-based parallelization of merge sort, quick sort, bubble sort, radix top-k selection sort, and slow sort using CUDA.
  • Evaluate performance, parallel time complexity, and space complexity on modern GPUs.

Main Methods

  • Implemented and optimized GPU-based parallel sorting algorithms (MS, QS, BS, RS, SS) using CUDA.
  • Conducted experiments on diverse datasets (random, reverse-sorted, sorted, nearly-sorted).
  • Compared GPU-accelerated versions against sequential counterparts.

Main Results

  • Radix Sort achieved ~50x speedup, Quick Sort ~97x, and Merge Sort ~103x on 10 million random elements.
  • Bubble Sort showed ~17x improvement but remained slower overall.
  • Slow Sort demonstrated ~18.6x speedup.
  • New single-GPU implementations achieve 17x to over 100x speedups.

Conclusions

  • GPU-accelerated sorting algorithms offer substantial performance improvements over sequential methods.
  • Modern GPUs and CUDA enable significant acceleration for large-scale data sorting tasks.
  • Radix, Quick, and Merge Sort show the most promising speedups for parallel processing.

Related Concept Videos

Parallel Resonance 01:23

563

The parallel RLC circuit is an arrangement where the resistor (R), inductor (L), and capacitor (C) are all connected to the same nodes and, as a result, share the same voltage across them. The parallel RLC circuit is analyzed in terms of admittance (Y), which reflects the ease with which current can flow. The admittance is given by:

Resonance in a parallel RLC circuit occurs when the net reactance is zero, meaning the capacitive and inductive effects cancel each other out. This condition is...

Parallel Processing 01:20

730

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

Trial and Error and Algorithm 01:12

424

A problem-solving strategy is a plan of action used to find a solution. Different strategies have distinct action plans. Trial and error involves trying different solutions until one works. For instance, to fix a broken printer, you might check ink levels, ensure the paper tray isn't jammed, and verify the printer's connection to your laptop. This method can be time-consuming but is commonly used. Thomas Edison, for example, used trial and error to find a suitable filament for the light...

Resistors In Parallel 01:23

6.3K

Resistors are in parallel when one end of all the resistors are connected to a continuous wire of negligible resistance and the other end of all the resistors are also connected to one another through a continuous wire of negligible resistance. In the case of a parallel configuration, the potential drop across each resistor is the same. Current through each resistor can be found using Ohm’s law, I = V/R, where the voltage is constant across each resistor. The sum of the individual currents...

Series and Parallel Capacitors 01:14

9.6K

Capacitors, fundamental components in electronic circuits, can be connected in series and/or parallel configurations. Each configuration has different impacts on the overall behavior of the circuit.
First, consider capacitors connected in series to a battery. In this configuration, the plate connected to the battery's positive terminal develops a positive charge, while the plate attached to the negative terminal becomes negatively charged. An equal magnitude of charge is induced on the...

Parallel-axis Theorem 01:06

8.3K

The parallel-axis theorem provides a convenient and quick method of finding the moment of inertia of an object about an axis parallel to the axis passing through its center of mass. Consider a thin rod as an example. There is a striking similarity between the process of finding the moment of inertia of a thin rod about an axis through its middle, where the center of mass lies, and about an axis through its end using the conventional method. In the conventional method, the concept of linear mass...