TY - JOUR
T1 - Cache-oblivious hashing
AU - Pagh, Rasmus
AU - Wei, Zhewei
AU - Yi, Ke
AU - Zhang, Qin
PY - 2014/8
Y1 - 2014/8
N2 - The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t q =1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t q =1+Θ(α/b) even if a truly random hash function is used. Then we demonstrate that the block probing algorithm (Pagh et al. in SIAM Rev. 53(3):547-558, 2011) achieves t q =1+1/2 Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Note that the two conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t q =1+O(α/b), which is exactly what linear probing achieves.
AB - The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t q =1+1/2 Ω(b) disk accesses for any load factor α bounded away from 1. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases. We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t q =1+Θ(α/b) even if a truly random hash function is used. Then we demonstrate that the block probing algorithm (Pagh et al. in SIAM Rev. 53(3):547-558, 2011) achieves t q =1+1/2 Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Note that the two conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t q =1+O(α/b), which is exactly what linear probing achieves.
KW - Cache-oblivious algorithms
KW - Hashing
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000336426000004
UR - https://openalex.org/W2048064086
UR - https://www.scopus.com/pages/publications/84901830243
U2 - 10.1007/s00453-013-9763-6
DO - 10.1007/s00453-013-9763-6
M3 - Journal Article
SN - 0178-4617
VL - 69
SP - 864
EP - 883
JO - Algorithmica (New York)
JF - Algorithmica (New York)
IS - 4
ER -