scholarly journals Boosting Targeted Black-Box Attacks via Ensemble Substitute Training and Linear Augmentation

2019 ◽  
Vol 9 (11) ◽  
pp. 2286 ◽  
Author(s):  
Xianfeng Gao ◽  
Yu-an Tan ◽  
Hongwei Jiang ◽  
Quanxin Zhang ◽  
Xiaohui Kuang

These years, Deep Neural Networks (DNNs) have shown unprecedented performance in many areas. However, some recent studies revealed their vulnerability to small perturbations added on source inputs. Furthermore, we call the ways to generate these perturbations’ adversarial attacks, which contain two types, black-box and white-box attacks, according to the adversaries’ access to target models. In order to overcome the problem of black-box attackers’ unreachabilities to the internals of target DNN, many researchers put forward a series of strategies. Previous works include a method of training a local substitute model for the target black-box model via Jacobian-based augmentation and then use the substitute model to craft adversarial examples using white-box methods. In this work, we improve the dataset augmentation to make the substitute models better fit the decision boundary of the target model. Unlike the previous work that just performed the non-targeted attack, we make it first to generate targeted adversarial examples via training substitute models. Moreover, to boost the targeted attacks, we apply the idea of ensemble attacks to the substitute training. Experiments on MNIST and GTSRB, two common datasets for image classification, demonstrate our effectiveness and efficiency of boosting a targeted black-box attack, and we finally attack the MNIST and GTSRB classifiers with the success rates of 97.7% and 92.8%.

2021 ◽  
Vol 72 ◽  
pp. 1-37
Author(s):  
Mike Wu ◽  
Sonali Parbhoo ◽  
Michael C. Hughes ◽  
Volker Roth ◽  
Finale Doshi-Velez

Deep models have advanced prediction in many domains, but their lack of interpretability  remains a key barrier to the adoption in many real world applications. There exists a large  body of work aiming to help humans understand these black box functions to varying levels  of granularity – for example, through distillation, gradients, or adversarial examples. These  methods however, all tackle interpretability as a separate process after training. In this  work, we take a different approach and explicitly regularize deep models so that they are  well-approximated by processes that humans can step through in little time. Specifically,  we train several families of deep neural networks to resemble compact, axis-aligned decision  trees without significant compromises in accuracy. The resulting axis-aligned decision  functions uniquely make tree regularized models easy for humans to interpret. Moreover,  for situations in which a single, global tree is a poor estimator, we introduce a regional tree regularizer that encourages the deep model to resemble a compact, axis-aligned decision  tree in predefined, human-interpretable contexts. Using intuitive toy examples, benchmark  image datasets, and medical tasks for patients in critical care and with HIV, we demonstrate  that this new family of tree regularizers yield models that are easier for humans to simulate  than L1 or L2 penalties without sacrificing predictive power. 


Author(s):  
Dmitry M. Voynov ◽  
Vassili A. Kovalev

Recently, the majority of research and development teams working in the field deep learning are concentrated on the improvement of the classification accuracy and related measures of the quality of image classification whereas the problem of adversarial attacks to deep neural networks attracts much less attention. This article is dedicated to an experimental study of the influence of various factors on the stability of convolutional neural networks under the condition of adversarial attacks to biomedical image classification. On a very extensive dataset consisted of more than 1.45 million of radiological as well as histological images we assess the efficiency of attacks performed using the projected gradient descent (PGD), DeepFool and Carlini – Wagner (CW) methods. We analyze the results of both white and black box attacks to the commonly used neural architectures such as InceptionV3, Densenet121, ResNet50, MobileNet and Xception. The basic conclusion of this study is that in the field of biomedical image classification the problem of adversarial attack stays sharp because the methods of attacks being tested are successfully attacking the above-mentioned networks so that depending on the specific task their original classification accuracy falls down from 83–97 % down to the accuracy score of 15 %. Also, it was found that under similar conditions the PGD method is less successful in adversarial attacks comparing to the DeepFool and CW methods. When the original images and adversarial examples are compared using the L2-norm, the DeepFool and CW methods generate the adversarial examples of similar maliciousness. In addition, in three out of four of black-box attacks, the PGD method has demonstrated lower attacking efficiency.


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 268 ◽  
Author(s):  
Hokuto Hirano ◽  
Kazuhiro Takemoto

Deep neural networks (DNNs) are vulnerable to adversarial attacks. In particular, a single perturbation known as the universal adversarial perturbation (UAP) can foil most classification tasks conducted by DNNs. Thus, different methods for generating UAPs are required to fully evaluate the vulnerability of DNNs. A realistic evaluation would be with cases that consider targeted attacks; wherein the generated UAP causes the DNN to classify an input into a specific class. However, the development of UAPs for targeted attacks has largely fallen behind that of UAPs for non-targeted attacks. Therefore, we propose a simple iterative method to generate UAPs for targeted attacks. Our method combines the simple iterative method for generating non-targeted UAPs and the fast gradient sign method for generating a targeted adversarial perturbation for an input. We applied the proposed method to state-of-the-art DNN models for image classification and proved the existence of almost imperceptible UAPs for targeted attacks; further, we demonstrated that such UAPs can be easily generated.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 428
Author(s):  
Hyun Kwon ◽  
Jun Lee

This paper presents research focusing on visualization and pattern recognition based on computer science. Although deep neural networks demonstrate satisfactory performance regarding image and voice recognition, as well as pattern analysis and intrusion detection, they exhibit inferior performance towards adversarial examples. Noise introduction, to some degree, to the original data could lead adversarial examples to be misclassified by deep neural networks, even though they can still be deemed as normal by humans. In this paper, a robust diversity adversarial training method against adversarial attacks was demonstrated. In this approach, the target model is more robust to unknown adversarial examples, as it trains various adversarial samples. During the experiment, Tensorflow was employed as our deep learning framework, while MNIST and Fashion-MNIST were used as experimental datasets. Results revealed that the diversity training method has lowered the attack success rate by an average of 27.2 and 24.3% for various adversarial examples, while maintaining the 98.7 and 91.5% accuracy rates regarding the original data of MNIST and Fashion-MNIST.


Sign in / Sign up

Export Citation Format

Share Document