Differential privacy (DP) has garnered significant attention from both academia and industry due to its potential in offering robust privacy protection for individual data during analysis. With the increasing volume of sensitive information being collected by organizations and analyzed through SQL queries, the development of a general-purpose query engine that is capable of supporting a broad range of SQLs while maintaining differential privacy has become the holy grail in privacy-preserving query release. Over the past several years, the database and security communities have made numerous efforts to achieve this objective. However, two major challenges still exist: Challenge 1: Privacy guarantee in relational databases Informally speaking, DP requires indistinguishability of the query results whether any particular individual’s data is included or not in the database. While this can be easily defined and well studied over a single flat table, the situation becomes more complex in a relational database with the presence of multiple relations, foreign keys, and the join operator. Challenge 2: Optimal privacy-utility trade-off Noise injection is inherently necessary for privacy protection, but the central scientific question is how to achieve the lowest error (i.e., highest utility) under a given privacy budget. Unfortunately, for the problem of relational query evaluation under DP, traditional notions of optimality, i.e., instance optimality and worst-case optimality, are either unattainable or meaningless. Thus, a new notion of optimality is needed for answering this question. In this thesis, we aim at tackling these challenges. To model the complex situation of relational databases, we study two different DP policies: tuple-DP, which protects the privacy of single tuples in each relation, and user-DP, which protects all data belonging to each user via foreign keys. Under each policy, we have designed DP mechanisms for answering a broad class of queries consisting of the selection, projection, aggregation, and join operators. To formalize their optimality, we introduce the notion of neighborhood optimality, which sits between instance optimality and worst-case optimality. Finally, based on the algorithms and theory developed, we have built a DP-SQL system that significantly outperforms existing systems in terms of both utility and efficiency.
| Date of Award | 2023 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Ke YI (Supervisor) |
|---|
Query evaluation under differential privacy
DONG, W. (Author). 2023
Student thesis: Doctoral thesis