Effective keyword search on large scale graphs in a distributed system

  • Mengyu Li

Student thesis: Master's thesis

Abstract

Keyword query is more user-friendly than the structured query (SQL, XQuery, SPARQL)as it requires no pre-knowledge about the underlying schema of the databases. In the recent decade there has been much work focusing on the keyword search in both the structured databases and unstructured databases. Particularly, since the labeled graph model can generally present both the structured and unstructured databases, graph keyword search has been a hot topic in the database community. Furthermore, with the rapid development of semantic web, the graph data has grown to a huge scale and has to be stored and processed in a distributed environment. However, no existing approach can be directly applied to answer the keyword queries in the distributed graphs. The challenges exist in two-fold, the poor locality exists in the keyword query answers may lead to high communication cost for graph exploration and the huge traffic load for shifting the data among different machines. To address these two challenges, in this thesis, we first propose to optimize the index in order to minimal the distributed query processing cost within the space budget on each machine; Then, based on this indexing scheme we introduce an advanced distributed query processing algorithm, which includes an optimal scheduling problem defined to reduce traffic and the time cost, and we propose a greedy algorithm with a factor-2 approximation to the optimal schedule problem; At last, we verify the effectiveness and efficiency of our solution with large scale real datasets.
Date of Award2013
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'