Skip to main navigation Skip to search Skip to main content

Surfacing code in the dark: an instant clone search approach

  • Jin Woo Park
  • , Mu Woong Lee
  • , Jong Won Roh
  • , Seung Won Hwang*
  • , Sunghun Kim
  • *Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

Abstract

In this paper, we study how to “surface” code for instant reference. A traditional mode of surfacing code has been treating code as text and applying keyword search techniques. However, many prior work observes the limitation of such approach: (1) semantic description of code is limited to comments and (2) syntactic keyword is often not selective enough. In contrast, we discuss enabling techniques and scenarios of instant semantic-based surfacing. For example, developers, during a development session, may reference the existing code sharing similar semantics, using his code so far as a query. In addition to such semantic-based surfacing, we also enhance keyword-based surfacing with semantics, by instantly adding semantic tags for code submitted to the repository. To achieve this goal, we first propose scalable indexing structures on vector abstractions of code. Our experimental results show our techniques outperform a state-of-the-art tool in efficiency without compromising accuracy. We then deploy our technique for instant search and tagging scenarios: For instant code search scenario, we demonstrate an instant clone search tool using our techniques, supporting sub-second search over 54 million LOC. For instant code tagging scenario, we propose an automatic instant code tagging algorithm to mine the meaningful tags from clones.

Original languageEnglish
Pages (from-to)727-759
Number of pages33
JournalKnowledge and Information Systems
Volume41
Issue number3
DOIs
Publication statusPublished - 7 Nov 2014

Bibliographical note

Publisher Copyright:
© 2013, Springer-Verlag London.

Keywords

  • Code indexing
  • Instant code search
  • Instant code tagging
  • Software development

Fingerprint

Dive into the research topics of 'Surfacing code in the dark: an instant clone search approach'. Together they form a unique fingerprint.

Cite this