Abstract
Video object detection is a fundamental tool for many applications. Since direct application of image-based object detection cannot leverage the rich temporal information inherent in video data, we advocate to the detection of long-range video object pattern. While the Long Short-Term Memory (LSTM) has been the de facto choice for such detection, currently LSTM cannot fundamentally model object association between consecutive frames. In this paper, we propose the association LSTM to address this fundamental association problem. Association LSTM not only regresses and classifiy directly on object locations and categories but also associates features to represent each output object. By minimizing the matching error between these features, we learn how to associate objects in two consecutive frames. Additionally, our method works in an online manner, which is important for most video tasks. Compared to the traditional video object detection methods, our approach outperforms them on standard video datasets.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 2363-2371 |
| Number of pages | 9 |
| ISBN (Electronic) | 9781538610329 |
| DOIs | |
| Publication status | Published - 22 Dec 2017 |
| Event | 16th IEEE International Conference on Computer Vision, ICCV 2017 - Venice, Italy Duration: 22 Oct 2017 → 29 Oct 2017 |
Publication series
| Name | Proceedings of the IEEE International Conference on Computer Vision |
|---|---|
| Volume | 2017-October |
| ISSN (Print) | 1550-5499 |
Conference
| Conference | 16th IEEE International Conference on Computer Vision, ICCV 2017 |
|---|---|
| Country/Territory | Italy |
| City | Venice |
| Period | 22/10/17 → 29/10/17 |
Bibliographical note
Publisher Copyright:© 2017 IEEE.