TY - JOUR
T1 - Fast parallel path concatenation for graph extraction
AU - Shao, Yingxia
AU - Lei, Kai
AU - Chen, Lei
AU - Huang, Zi
AU - Cui, Bin
AU - Liu, Zhongyi
AU - Tong, Yunhai
AU - Xu, Jin
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2017/10/1
Y1 - 2017/10/1
N2 - Heterogeneous graph is a popular data model to represent the real-world relations with abundant semantics. To analyze heterogeneous graphs, an important step is extracting homogeneous graphs from the heterogeneous graphs, called homogeneous graph extraction. In an extracted homogeneous graph, the relation is defined by a line pattern on the heterogeneous graph and the new attribute values of the relation are calculated by user-defined aggregate functions. The key challenges of the extraction problem are how to efficiently enumerate paths matched by the line pattern and aggregate values for each pair of vertices from the matched paths. To address above two challenges, we propose a parallel graph extraction framework, where we use vertex-centric model to enumerate paths and compute aggregate functions in parallel. The framework compiles the line pattern into a path concatenation plan, which determines the order of concatenating paths and generates the final paths in a divide-and-conquer manner. We introduce a cost model to estimate the cost of a plan and discuss three plan selection strategies, among which the best plan can enumerate paths in O(log(l)) iterations, where l is the length of a pattern. Furthermore, to improve the performance of evaluating aggregate functions, we classify the aggregate functions into three categories, i.e., distributive aggregation, algebraic aggregation, and holistic aggregation. Since the distributive and algebraic aggregations can be computed from the partial paths, we speed up the aggregation by computing partial aggregate values during the path enumeration.
AB - Heterogeneous graph is a popular data model to represent the real-world relations with abundant semantics. To analyze heterogeneous graphs, an important step is extracting homogeneous graphs from the heterogeneous graphs, called homogeneous graph extraction. In an extracted homogeneous graph, the relation is defined by a line pattern on the heterogeneous graph and the new attribute values of the relation are calculated by user-defined aggregate functions. The key challenges of the extraction problem are how to efficiently enumerate paths matched by the line pattern and aggregate values for each pair of vertices from the matched paths. To address above two challenges, we propose a parallel graph extraction framework, where we use vertex-centric model to enumerate paths and compute aggregate functions in parallel. The framework compiles the line pattern into a path concatenation plan, which determines the order of concatenating paths and generates the final paths in a divide-and-conquer manner. We introduce a cost model to estimate the cost of a plan and discuss three plan selection strategies, among which the best plan can enumerate paths in O(log(l)) iterations, where l is the length of a pattern. Furthermore, to improve the performance of evaluating aggregate functions, we classify the aggregate functions into three categories, i.e., distributive aggregation, algebraic aggregation, and holistic aggregation. Since the distributive and algebraic aggregations can be computed from the partial paths, we speed up the aggregation by computing partial aggregate values during the path enumeration.
KW - Graph extraction
KW - Heterogenegous graph
KW - Parallelisim
KW - Path concatenation
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000410643300011
UR - https://openalex.org/W2633320030
UR - https://www.scopus.com/pages/publications/85021805254
U2 - 10.1109/TKDE.2017.2716939
DO - 10.1109/TKDE.2017.2716939
M3 - Journal Article
SN - 1041-4347
VL - 29
SP - 2210
EP - 2222
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 10
M1 - 7953521
ER -