BlockQuicksort: Avoiding Branch Mispredictions in Quicksort

Stefan Josef Edelkamp; Armin Weiss

doi:10.4230/LIPIcs.ESA.2016.38

BlockQuicksort: Avoiding Branch Mispredictions in Quicksort

Stefan Josef Edelkamp, Armin Weiss

Informatics

Stevens Institute of Technology, Hoboken, NJ

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

6 Citations (Scopus)

127 Downloads (Pure)

Abstract

Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant size (we propose 128 data elements); then, all elements in one block are compared
with the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons at all (except for the final Insertionsort). Moreover, we prove that for a static branch predictor the average total number of branch mispredictions is at most en log n + O(n) for some small e depending on the block size when sorting n elements. Our experimental results are promising: when sorting random integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCCimplementation of C++ std::sort. Also for many other types of data and non-random inputs, there is still a significant speedup over std::sort. Only in few special cases like sorted or almost sorted inputs, std::sort can beat our implementation. Moreover, even on random input
permutations, our implementation is even slightly faster than an implementation of the highly tuned Super Scalar Sample Sort, which uses a linear amount of additional space.

Original language	English
Title of host publication	24th Annual European Symposium on Algorithms (ESA 2016)
Subtitle of host publication	ESA 2016, August 22–24, 2016, Aarhus, Denmark
Editors	Piotr Sankowski, Christos Zaroliagis
Publisher	Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Pages	38:1–38:16
Number of pages	16
Volume	57
ISBN (Electronic)	18688969
ISBN (Print)	97839599770156
DOIs	https://doi.org/10.4230/LIPIcs.ESA.2016.38
Publication status	Published - Aug 2016

Access to Document

10.4230/LIPIcs.ESA.2016.38Licence: CC BY

BlockQuicksort Avoiding Branch Mispredictions_EDELKAMP_PublishedAugust2016 VoR (CC BY)Final published version, 622 KBLicence: CC BY

Cite this

Edelkamp, S. J., & Weiss, A. (2016). BlockQuicksort: Avoiding Branch Mispredictions in Quicksort. In P. Sankowski, & C. Zaroliagis (Eds.), 24th Annual European Symposium on Algorithms (ESA 2016): ESA 2016, August 22–24, 2016, Aarhus, Denmark (Vol. 57, pp. 38:1–38:16). Article 38 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.ESA.2016.38

@inbook{7d3c3682587d420ca17ea44ba86e852c,

title = "BlockQuicksort: Avoiding Branch Mispredictions in Quicksort",

abstract = "Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant size (we propose 128 data elements); then, all elements in one block are comparedwith the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons at all (except for the final Insertionsort). Moreover, we prove that for a static branch predictor the average total number of branch mispredictions is at most en log n + O(n) for some small e depending on the block size when sorting n elements. Our experimental results are promising: when sorting random integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCCimplementation of C++ std::sort. Also for many other types of data and non-random inputs, there is still a significant speedup over std::sort. Only in few special cases like sorted or almost sorted inputs, std::sort can beat our implementation. Moreover, even on random inputpermutations, our implementation is even slightly faster than an implementation of the highly tuned Super Scalar Sample Sort, which uses a linear amount of additional space.",

author = "Edelkamp, {Stefan Josef} and Armin Weiss",

note = "Recent Journal Upgrade to {"}BlockQuicksort: Avoiding Branch Mispredictions in Quicksort{"} by Stefan Edelkamp and Armin Wei{\ss}, ACM Journal of Experimental Algorithmics ",

year = "2016",

month = aug,

doi = "10.4230/LIPIcs.ESA.2016.38",

language = "English",

isbn = "97839599770156",

volume = "57",

pages = "38:1–38:16",

editor = "Piotr Sankowski and Christos Zaroliagis",

booktitle = "24th Annual European Symposium on Algorithms (ESA 2016)",

publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",

address = "Germany",

}

Edelkamp, SJ & Weiss, A 2016, BlockQuicksort: Avoiding Branch Mispredictions in Quicksort. in P Sankowski & C Zaroliagis (eds), 24th Annual European Symposium on Algorithms (ESA 2016): ESA 2016, August 22–24, 2016, Aarhus, Denmark. vol. 57, 38, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, pp. 38:1–38:16. https://doi.org/10.4230/LIPIcs.ESA.2016.38

BlockQuicksort: Avoiding Branch Mispredictions in Quicksort. / Edelkamp, Stefan Josef; Weiss, Armin.
24th Annual European Symposium on Algorithms (ESA 2016): ESA 2016, August 22–24, 2016, Aarhus, Denmark. ed. / Piotr Sankowski; Christos Zaroliagis. Vol. 57 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2016. p. 38:1–38:16 38.

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

TY - CHAP

T1 - BlockQuicksort

T2 - Avoiding Branch Mispredictions in Quicksort

AU - Edelkamp, Stefan Josef

AU - Weiss, Armin

N1 - Recent Journal Upgrade to "BlockQuicksort: Avoiding Branch Mispredictions in Quicksort" by Stefan Edelkamp and Armin Weiß, ACM Journal of Experimental Algorithmics

PY - 2016/8

Y1 - 2016/8

N2 - Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant size (we propose 128 data elements); then, all elements in one block are comparedwith the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons at all (except for the final Insertionsort). Moreover, we prove that for a static branch predictor the average total number of branch mispredictions is at most en log n + O(n) for some small e depending on the block size when sorting n elements. Our experimental results are promising: when sorting random integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCCimplementation of C++ std::sort. Also for many other types of data and non-random inputs, there is still a significant speedup over std::sort. Only in few special cases like sorted or almost sorted inputs, std::sort can beat our implementation. Moreover, even on random inputpermutations, our implementation is even slightly faster than an implementation of the highly tuned Super Scalar Sample Sort, which uses a linear amount of additional space.

AB - Since the work of Kaligosi and Sanders (2006), it is well-known that Quicksort – which is commonly considered as one of the fastest in-place sorting algorithms – suffers in an essential way from branch mispredictions. We present a novel approach to address this problem by partially decoupling control from data flow: in order to perform the partitioning, we split the input in blocks of constant size (we propose 128 data elements); then, all elements in one block are comparedwith the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons at all (except for the final Insertionsort). Moreover, we prove that for a static branch predictor the average total number of branch mispredictions is at most en log n + O(n) for some small e depending on the block size when sorting n elements. Our experimental results are promising: when sorting random integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCCimplementation of C++ std::sort. Also for many other types of data and non-random inputs, there is still a significant speedup over std::sort. Only in few special cases like sorted or almost sorted inputs, std::sort can beat our implementation. Moreover, even on random inputpermutations, our implementation is even slightly faster than an implementation of the highly tuned Super Scalar Sample Sort, which uses a linear amount of additional space.

U2 - 10.4230/LIPIcs.ESA.2016.38

DO - 10.4230/LIPIcs.ESA.2016.38

M3 - Conference paper

SN - 97839599770156

VL - 57

SP - 38:1–38:16

BT - 24th Annual European Symposium on Algorithms (ESA 2016)

A2 - Sankowski, Piotr

A2 - Zaroliagis, Christos

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -

Edelkamp SJ, Weiss A. BlockQuicksort: Avoiding Branch Mispredictions in Quicksort. In Sankowski P, Zaroliagis C, editors, 24th Annual European Symposium on Algorithms (ESA 2016): ESA 2016, August 22–24, 2016, Aarhus, Denmark. Vol. 57. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2016. p. 38:1–38:16. 38 doi: 10.4230/LIPIcs.ESA.2016.38

BlockQuicksort: Avoiding Branch Mispredictions in Quicksort

Abstract

Access to Document

Fingerprint

Cite this