Autocross: Automatic feature crossing for tabular data in real-world applications

Yuanfei Luo, Quanming Yao*, Mengshuo Wang, Wei Wei Tu, Hao Zhou, Yuqiang Chen, Wenyuan Dai, Qiang Yang

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

81 Citations (Scopus)

Abstract

Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.1

Original languageEnglish
Title of host publicationKDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1936-1945
Number of pages10
ISBN (Electronic)9781450362016
DOIs
Publication statusPublished - 25 Jul 2019
Event25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019 - Anchorage, United States
Duration: 4 Aug 20198 Aug 2019

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019
Country/TerritoryUnited States
CityAnchorage
Period4/08/198/08/19

Bibliographical note

Publisher Copyright:
© 2019 Association for Computing Machinery.

Keywords

  • AutoML
  • Feature Crossing
  • Tabular Data

Fingerprint

Dive into the research topics of 'Autocross: Automatic feature crossing for tabular data in real-world applications'. Together they form a unique fingerprint.

Cite this