TY - JOUR
T1 - Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond
AU - Liu, Hao
AU - Guo, Qingyu
AU - Zhu, Hengshu
AU - Zhuang, Fuzhen
AU - Yang, Shenwen
AU - Dou, Dejing
AU - Xiong, Hui
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/10
Y1 - 2022/10
N2 - Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team's competition performance, and investigate the team's evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams' future performances. By leveraging team's rank as a proxy, we observe "the stronger, the stronger"rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team's future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team's future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.
AB - Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team's competition performance, and investigate the team's evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams' future performances. By leveraging team's rank as a proxy, we observe "the stronger, the stronger"rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team's future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team's future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.
KW - Data science competition prediction
KW - deep representation learning
KW - multi-task learning
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000802146500018
UR - https://openalex.org/W4226196772
UR - https://www.scopus.com/pages/publications/85131169490
U2 - 10.1145/3511896
DO - 10.1145/3511896
M3 - Journal Article
SN - 1556-4681
VL - 16
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 5
M1 - 98
ER -