Kernel-based multiple-instance learning

  • Pak Ming Cheung

Student thesis: Master's thesis

Abstract

In recent years, the Multiple-Instance Learning (MIL) problem is becoming more and more popular in the machine learning community. Each training object (bag) of the MIL problem is a set of patterns (instances). Label information is only associated with the bags, but not with their constituent instances. Moreover, a positive bag must have at least one positive instance, but may have many neg-ative instances. Since we can only access the label information of a bag and a positive bag may have many negative instances, MIL is more challenging than the traditional supervised learning (or single-instance learning). On the other hand, it is fruitful to study MIL, since many real-world problems such as drug activity prediction are inherently MI problems which cannot be generalized well under the traditional single-instance learning model. In addition, the generaliza-tion performance of many single-instance learning problems, e.g., Content-based Image Retrieval (CBIR), are found to be improved when they are casted into an appropriate MIL representation. In this thesis, I study MIL algorithms based on kernel methods. In particular, I focus on support vector machines, which have been highly successful in many machine learning problems. This thesis first discusses how to re-formulate the SVM to adapt to the MI problem setting by utilizing both the bag and instance information at the same time. After that, I propose how to define a MI kernel over bags based on the marginalizing kernel. The resulted bag kernel can then be used in a standard SVM. I also extend this marginalized kernel to the real-valued regression setting, which is more and more popular in the MIL community. Empirical results show that the proposed methods have better performance over various traditional methods.
Date of Award2006
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'