In this study, multiple cost saving options for computing clusters specifically designed for big-data analytic systems are investigated. The ultimate goal is to allow the computing clusters to provide more economical software hosting services for big-data analytic applications through hardware multiplexing, economics of scale, data sharing and so on. We are particularly interested in the research problems in the areas of virtual machine (VM) workload consolidation, job scheduling in big-data analytic framework and data deduplication in data storage systems. In particular, we seek to answer the following questions: (1) How should we group and assign virtual machines in order to minimize the cost of the data center? (2) How should we schedule jobs in a big-data analytic system according to their time budget? (3) How should files be distributed and stored in multiple servers in order to eliminate data redundancy maximally? By combining the answers to the above questions, we aim to produce advanced management systems that help the big-data application users reduce their overall operating cost without jeopardizing the quality of service of their applications. To produce insightful deigns, we mainly rely on advanced discrete optimization techniques and graph theoretic techniques to acquire intelligent decisions with system dynamics in consideration. For example, Lagrangian relaxation and M-convex optimization techniques are applied to solve the VM workload consolidation problem, while total unimodularity and robust optimization techniques are applied to perform job scheduling for big-data analytic systems in a real-time manner. The design principle that we value heavily is that the proposed solutions must consist of light-weight distributed algorithms that can be implemented for the real-world systems. Our proposals are derived with non-trivial theoretical foundations. To evaluate the practicality of the proposed systems, we built prototypes upon representative real-world platforms. For performance evaluations that require a large-scale data center, simulation programs are developed. Our results suggest that our proposed systems are efficient, agile and robust. We believe that after fine tuning, the prototype systems can be powerful tools for existing big-data analytic softwares.
| Date of Award | 2015 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
Towards a better computing cluster for big-data analytic systems
Huang, Z. (Author). 2015
Student thesis: Doctoral thesis