Heterogeneous Defect Prediction

Jaechang Nam*, Wei Fu, Sunghun Kim, Tim Menzies, Lin Tan

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

221 Citations (Scopus)

Abstract

Many recent studies have documented the success of cross-project defect prediction (CPDP) to predict defects for new projects lacking in defect data by using prediction models built by other projects. However, most studies share the same limitations: it requires homogeneous data; i.e., different projects must describe themselves using the same metrics. This paper presents methods for heterogeneous defect prediction (HDP) that matches up different metrics in different projects. Metric matching for HDP requires a 'large enough' sample of distributions in the source and target projects - which raises the question on how large is 'large enough' for effective heterogeneous defect prediction. This paper shows that empirically and theoretically, 'large enough' may be very small indeed. For example, using a mathematical model of defect prediction, we identify categories of data sets were as few as 50 instances are enough to build a defect prediction model. Our conclusion for this work is that, even when projects use different metric sets, it is possible to quickly transfer lessons learned about defect prediction.

Original languageEnglish
Article number7959597
Pages (from-to)874-896
Number of pages23
JournalIEEE Transactions on Software Engineering
Volume44
Issue number9
DOIs
Publication statusPublished - 1 Sept 2018

Bibliographical note

Publisher Copyright:
© 1976-2012 IEEE.

Keywords

  • Defect prediction
  • heterogeneous metrics
  • quality assurance
  • transfer learning

Fingerprint

Dive into the research topics of 'Heterogeneous Defect Prediction'. Together they form a unique fingerprint.

Cite this