TY - JOUR
T1 - Logchain
T2 - Cloud workflow reconstruction & troubleshooting with unstructured logs
AU - Zhou, Pengpeng
AU - Wang, Yang
AU - Li, Zhenyu
AU - Tyson, Gareth
AU - Guan, Hongtao
AU - Xie, Gaogang
N1 - Publisher Copyright:
© 2020
PY - 2020/7/5
Y1 - 2020/7/5
N2 - Cloud-based virtualization has become a key part of building distributed applications. One of its many benefits is the ability to dynamically manage system capacity by creating, deleting and migrating virtual machines (VMs) on-demand. This management process, however, depends on complex pipelines, involving multiple services invocations across distributed nodes. This makes troubleshooting and debugging difficult, as these complex pipelines lack an integrated logging system. Instead, each service generates independent and unstructured log messages without the ability to link logs into a single integrated workflow. We present LogChain, a tool that gathers and processes distributed unstructured logs to diagnose failures in cloud management tasks. It contains three key functions: (i) It infers task workflows from distributed unstructured logs; (ii) it labels these workflows with the tasks that triggered them; and (iii) it diagnoses potential failures in the workflow's execution, to support administrator with troubleshooting. We evaluate LogChain with realistic workloads, and show that it exceeds the state-of-the-art in terms of performance and accuracy.
AB - Cloud-based virtualization has become a key part of building distributed applications. One of its many benefits is the ability to dynamically manage system capacity by creating, deleting and migrating virtual machines (VMs) on-demand. This management process, however, depends on complex pipelines, involving multiple services invocations across distributed nodes. This makes troubleshooting and debugging difficult, as these complex pipelines lack an integrated logging system. Instead, each service generates independent and unstructured log messages without the ability to link logs into a single integrated workflow. We present LogChain, a tool that gathers and processes distributed unstructured logs to diagnose failures in cloud management tasks. It contains three key functions: (i) It infers task workflows from distributed unstructured logs; (ii) it labels these workflows with the tasks that triggered them; and (iii) it diagnoses potential failures in the workflow's execution, to support administrator with troubleshooting. We evaluate LogChain with realistic workloads, and show that it exceeds the state-of-the-art in terms of performance and accuracy.
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000535454300010
UR - https://www.scopus.com/pages/publications/85084936604
U2 - 10.1016/j.comnet.2020.107279
DO - 10.1016/j.comnet.2020.107279
M3 - Journal Article
SN - 1389-1286
VL - 175
JO - Computer Networks
JF - Computer Networks
M1 - 107279
ER -