Histograms, as a crucial tool in database management and big data analytics, are used for selectivity estimation to get approximate query results efficiently. Minskew is a state-of-the-art multi-dimensional histogram algorithm that offers decent accuracy in selectivity estimation, but it has difficulty in achieving good performance for queries of different sizes. To solve this problem, we propose Grid Adaptive Minskew and Reinforcement Learning (RL) Histogram. While Minskew generates a set of global grid cells that are used for splitting at all levels, Grid Adaptive Minskew generates grid cells for each bucket independently, resulting in histograms with buckets of more flexible sizes and improved accuracy for various selectivities. RL Histogram shares the same data structure with Grid Adaptive Minskew. It first learns the knowledge from Grid Adaptive Minskew and then explores histograms with policy gradient learning algorithm. The experiments demonstrate that Grid Adaptive Minskew outperforms Minskew in all aspects, and RL Histogram can further improve the performance.
| Date of Award | 2022 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Dimitrios PAPADIAS (Supervisor) |
|---|
Multidimensional partitioning for histogram using reinforcement learning
DING, W. (Author). 2022
Student thesis: Master's thesis