Abstract
It is well known that Quicksort which is commonly considered as one of the fastest in-place sorting algorithms - suffers in an essential way from branch mispredictions. We present a novel approach to addressing this problem by partially decoupling control from dataflow: in order to perform the partitioning, we split the input into blocks of constant size. Then, all elements in one block are compared with the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons (except for the fnal Insertionsort). Moreover, we prove that when sorting n elements, the average total number of branch mispredictions is at most flan logn + O(n) for some small depending on the block size. Our experimental results are promising: when sorting random-integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCC implementation of Quicksort (C++ std::sort). Also, for many other types of data and non-random inputs, there is still a signifcant speedup over std::sort. Only in a few special cases, such as sorted or almost sorted inputs, can std::sort beat our implementation. Moreover, on random-input permutations, our implementation is even slightly faster than an implementation of the highly tuned Super Scalar Sample Sort, which uses a linear amount of additional space. Finally, we also apply our approach to Quickselect and obtain a speed-up of more than 100% over the GCC implementation (C++ std::nth-element).
Original language | English |
---|---|
Article number | 1.4 |
Pages (from-to) | 1-22 |
Journal | ACM Journal of Experimental Algorithmics |
Volume | 24 |
Issue number | 1 |
Early online date | 31 Jan 2019 |
DOIs | |
Publication status | Published - Dec 2019 |
Keywords
- Branch mispredictions
- In-place sorting
- Quicksort