Skip to main navigation Skip to search Skip to main content

Class-Disentanglement and Applications in Adversarial Detection and Defense

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

What is the minimum necessary information required by a neural net D(・) from an image x to accurately predict its class? Extracting such information in the input space from x can allocate the areas D(・) mainly attending to and shed novel insights to the detection and defense of adversarial attacks. In this paper, we propose "class-disentanglement" that trains a variational autoencoder G(・) to extract this class-dependent information as x − G(x) via a trade-off between reconstructing x by G(x) and classifying x by D(x − G(x)), where the former competes with the latter in decomposing x so the latter retains only necessary information for classification in x − G(x). We apply it to both clean images and their adversarial images and discover that the perturbations generated by adversarial attacks mainly lie in the class-dependent part x − G(x). The decomposition results also provide novel interpretations to classification and attack models. Inspired by these observations, we propose to conduct adversarial detection and adversarial defense respectively on x − G(x) and G(x), which consistently outperform the results on the original x. In experiments, this simple approach substantially improves the detection and defense against different types of adversarial attacks. Code is available: https://github.com/kai-wen-yang/CD-VAE.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
EditorsMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
PublisherNeural information processing systems foundation
Pages16051-16063
Number of pages13
ISBN (Electronic)9781713845393
Publication statusPublished - Dec 2021
Externally publishedYes
Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Duration: 6 Dec 202114 Dec 2021

Publication series

NameAdvances in Neural Information Processing Systems
Volume19
ISSN (Print)1049-5258

Conference

Conference35th Conference on Neural Information Processing Systems, NeurIPS 2021
CityVirtual, Online
Period6/12/2114/12/21

Bibliographical note

Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'Class-Disentanglement and Applications in Adversarial Detection and Defense'. Together they form a unique fingerprint.

Cite this