Learning cross-modal deep representations for robust pedestrian detection

Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

This paper presents a novel method for detecting pedestrians under adverse illumination conditions. Our approach relies on a novel cross-modality learning framework and it is based on two main phases. First, given a multimodal dataset, a deep convolutional network is employed to learn a non-linear mapping, modeling the relations between RGB and thermal data. Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results. In this way, features which are both discriminative and robust to bad illumination conditions are learned. Importantly, at test time, only the second pipeline is considered and no thermal data are required. Our extensive evaluation demonstrates that the proposed approach outperforms the state-ofthe- art on the challenging KAIST multispectral pedestrian dataset and it is competitive with previous methods on the popular Caltech dataset.

Original languageEnglish
Title of host publicationProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4236-4244
Number of pages9
ISBN (Electronic)9781538604571
DOIs
Publication statusPublished - 6 Nov 2017
Externally publishedYes
Event30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States
Duration: 21 Jul 201726 Jul 2017

Publication series

NameProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Volume2017-January

Conference

Conference30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Country/TerritoryUnited States
CityHonolulu
Period21/07/1726/07/17

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Fingerprint

Dive into the research topics of 'Learning cross-modal deep representations for robust pedestrian detection'. Together they form a unique fingerprint.

Cite this