Making call graph construction more practical for program analysis in the real world

  • Yuandao CAI

Student thesis: Doctoral thesis

Abstract

The explosively increasing code size and high concurrency make modern software increasingly complex and error-prone. Furthermore, the prevalence and extensive deployment of modern software have exacerbated the hazards of any software vulnerabilities and errors, potentially affecting millions of users and resulting in significant financial costs. Unfortunately, the ever-growing complexity of modern codebases has made many conventional program analysis tasks impractical for safeguarding today’s software. Specifically, as one of the most fundamental tasks of program analysis, call graph construction for large-scale code has become either imprecise or unscalable, which impairs the effectiveness of many downstream program analysis applications. In addition, as one of the most popular tasks of program analysis, bug detection is faced with significant challenges in effectively detecting concurrency bugs. In particular, unlike detecting sequential bugs, detecting concurrent bugs requires additional characterization of exponentionally possible thread interactions. To address these challenges, in this thesis, we contribute to making fundamental call graph construction and popular concurrency bug detection more practical. First, we advocate two new approaches for call graph construction with different precision-scalability trade-offs, catering to different scenarios throughout software development (e.g., the continuous integration and daily build process). Specifically, Kelp significantly outperforms existing type-based call graph construction, which enables the construction of precise call graphs for millions of lines of code in a few minutes. On the other hand, Coral largely outperforms existing pointer-analysis-based call graph construction, which facilitates the construction of precise call graphs for millions of lines of code in a few hours. By evaluating the call graphs through the lens of several various downstream program analysis clients (i.e., thread-sharing analysis, program slicing, value-flow bug detection, and directed grey-box fuzzing), our experimental results suggest that K<small>ELP</small> and C<small>ORAL</small> can dramatically promote their effectiveness for better vulnerability understanding, hunting, and reproduction. Second, we advocate a series of static approaches for concurrency bug detection, including C<small>ANARY</small> for detecting inter-thread value-flow bugs, P<small>EAHEN</small> for detecting deadlocks, and L<small>OCKPICK</small> for detecting lock misuses. It is important to note that the effectiveness of our concurrency bug detection can be also enhanced by our work on practical call graph construction. Our experimental results show that our concurrency bug detectors can beat many popular open-source and commercial static concurrency bug detectors, such as Clang Static Analyzer (CSA) and Facebook/Meta Infer, regarding efficiency and precision. Moreover, our concurrency bug detectors are practical and deployable, checking millions of lines of code in less than five hours, mostly with a false positive rate lower than 30%. Excitingly, our research prototypes for call graph construction and concurrency bug detection have been successfully deployed in two global 500 companies. Moreover, our concurrency bug detectors have uncovered over two hundred bugs with sixteen CVE IDs assigned in numerous award-winning and popular open-source software, such as the Linux kernel, Firefox, PHP, PostgreSQL, and OpenSSL. We believe that the adoption of our techniques at complicated industrial settings and the capability to uncover genuine bugs in the real world are strong evidence of our approaches’ effectiveness and advancement.
Date of Award2023
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorCharles Chuan ZHANG (Supervisor)

Cite this

'