scholarly journals A Hybrid Adversarial Attack for Different Application Scenarios

2020 ◽  
Vol 10 (10) ◽  
pp. 3559 ◽  
Author(s):  
Xiaohu Du ◽  
Jie Yu ◽  
Zibo Yi ◽  
Shasha Li ◽  
Jun Ma ◽  
...  

Adversarial attack against natural language has been a hot topic in the field of artificial intelligence security in recent years. It is mainly to study the methods and implementation of generating adversarial examples. The purpose is to better deal with the vulnerability and security of deep learning systems. According to whether the attacker understands the deep learning model structure, the adversarial attack is divided into black-box attack and white-box attack. In this paper, we propose a hybrid adversarial attack for different application scenarios. Firstly, we propose a novel black-box attack method of generating adversarial examples to trick the word-level sentiment classifier, which is based on differential evolution (DE) algorithm to generate semantically and syntactically similar adversarial examples. Compared with existing genetic algorithm based adversarial attacks, our algorithm can achieve a higher attack success rate while maintaining a lower word replacement rate. At the 10% word substitution threshold, we have increased the attack success rate from 58.5% to 63%. Secondly, when we understand the model architecture and parameters, etc., we propose a white-box attack with gradient-based perturbation against the same sentiment classifier. In this attack, we use a Euclidean distance and cosine distance combined metric to find the most semantically and syntactically similar substitution, and we introduce the coefficient of variation (CV) factor to control the dispersion of the modified words in the adversarial examples. More dispersed modifications can increase human imperceptibility and text readability. Compared with the existing global attack, our attack can increase the attack success rate and make modification positions in generated examples more dispersed. We’ve increased the global search success rate from 75.8% to 85.8%. Finally, we can deal with different application scenarios by using these two attack methods, that is, whether we understand the internal structure and parameters of the model, we can all generate good adversarial examples.

2021 ◽  
pp. 1-12
Author(s):  
Bo Yang ◽  
Kaiyong Xu ◽  
Hengjun Wang ◽  
Hengwei Zhang

Deep neural networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding small, human-imperceptible perturbations to the original images, but make the model output inaccurate predictions. Before DNNs are deployed, adversarial attacks can thus be an important method to evaluate and select robust models in safety-critical applications. However, under the challenging black-box setting, the attack success rate, i.e., the transferability of adversarial examples, still needs to be improved. Based on image augmentation methods, this paper found that random transformation of image brightness can eliminate overfitting in the generation of adversarial examples and improve their transferability. In light of this phenomenon, this paper proposes an adversarial example generation method, which can be integrated with Fast Gradient Sign Method (FGSM)-related methods to build a more robust gradient-based attack and to generate adversarial examples with better transferability. Extensive experiments on the ImageNet dataset have demonstrated the effectiveness of the aforementioned method. Whether on normally or adversarially trained networks, our method has a higher success rate for black-box attacks than other attack methods based on data augmentation. It is hoped that this method can help evaluate and improve the robustness of models.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Heng Yin ◽  
Hengwei Zhang ◽  
Jindong Wang ◽  
Ruiyu Dou

Convolutional neural networks have outperformed humans in image recognition tasks, but they remain vulnerable to attacks from adversarial examples. Since these data are crafted by adding imperceptible noise to normal images, their existence poses potential security threats to deep learning systems. Sophisticated adversarial examples with strong attack performance can also be used as a tool to evaluate the robustness of a model. However, the success rate of adversarial attacks can be further improved in black-box environments. Therefore, this study combines a modified Adam gradient descent algorithm with the iterative gradient-based attack method. The proposed Adam iterative fast gradient method is then used to improve the transferability of adversarial examples. Extensive experiments on ImageNet showed that the proposed method offers a higher attack success rate than existing iterative methods. By extending our method, we achieved a state-of-the-art attack success rate of 95.0% on defense models.


2021 ◽  
Vol 7 ◽  
pp. e702
Author(s):  
Gaoming Yang ◽  
Mingwei Li ◽  
Xianjing Fang ◽  
Ji Zhang ◽  
Xingzhu Liang

Adversarial examples are regarded as a security threat to deep learning models, and there are many ways to generate them. However, most existing methods require the query authority of the target during their work. In a more practical situation, the attacker will be easily detected because of too many queries, and this problem is especially obvious under the black-box setting. To solve the problem, we propose the Attack Without a Target Model (AWTM). Our algorithm does not specify any target model in generating adversarial examples, so it does not need to query the target. Experimental results show that it achieved a maximum attack success rate of 81.78% in the MNIST data set and 87.99% in the CIFAR-10 data set. In addition, it has a low time cost because it is a GAN-based method.


2020 ◽  
Vol 34 (04) ◽  
pp. 3405-3413
Author(s):  
Zhaohui Che ◽  
Ali Borji ◽  
Guangtao Zhai ◽  
Suiyi Ling ◽  
Jing Li ◽  
...  

Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of pre-trained source models can transfer to other new target models, thus pose a security threat to black-box applications (when the attackers have no access to the target models). Despite adopting diverse architectures and parameters, source and target models often share similar decision boundaries. Therefore, if an adversary is capable of fooling several source models concurrently, it can potentially capture intrinsic transferable adversarial information that may allow it to fool a broad class of other black-box target models. Current ensemble attacks, however, only consider a limited number of source models to craft an adversary, and obtain poor transferability. In this paper, we propose a novel black-box attack, dubbed Serial-Mini-Batch-Ensemble-Attack (SMBEA). SMBEA divides a large number of pre-trained source models into several mini-batches. For each single batch, we design 3 new ensemble strategies to improve the intra-batch transferability. Besides, we propose a new algorithm that recursively accumulates the “long-term” gradient memories of the previous batch to the following batch. This way, the learned adversarial information can be preserved and the inter-batch transferability can be improved. Experiments indicate that our method outperforms state-of-the-art ensemble attacks over multiple pixel-to-pixel vision tasks including image translation and salient region prediction. Our method successfully fools two online black-box saliency prediction systems including DeepGaze-II (Kummerer 2017) and SALICON (Huang et al. 2017). Finally, we also contribute a new repository to promote the research on adversarial attack and defense over pixel-to-pixel tasks: https://github.com/CZHQuality/AAA-Pix2pix.


2020 ◽  
Vol 2020 ◽  
pp. 1-9 ◽  
Author(s):  
Lingyun Jiang ◽  
Kai Qiao ◽  
Ruoxi Qin ◽  
Linyuan Wang ◽  
Wanting Yu ◽  
...  

In image classification of deep learning, adversarial examples where input is intended to add small magnitude perturbations may mislead deep neural networks (DNNs) to incorrect results, which means DNNs are vulnerable to them. Different attack and defense strategies have been proposed to better research the mechanism of deep learning. However, those researches in these networks are only for one aspect, either an attack or a defense. There is in the improvement of offensive and defensive performance, and it is difficult to promote each other in the same framework. In this paper, we propose Cycle-Consistent Adversarial GAN (CycleAdvGAN) to generate adversarial examples, which can learn and approximate the distribution of the original instances and adversarial examples, especially promoting attackers and defenders to confront each other and improve their ability. For CycleAdvGAN, once the GeneratorA and D are trained, GA can generate adversarial perturbations efficiently for any instance, improving the performance of the existing attack methods, and GD can generate recovery adversarial examples to clean instances, defending against existing attack methods. We apply CycleAdvGAN under semiwhite-box and black-box settings on two public datasets MNIST and CIFAR10. Using the extensive experiments, we show that our method has achieved the state-of-the-art adversarial attack method and also has efficiently improved the defense ability, which made the integration of adversarial attack and defense come true. In addition, it has improved the attack effect only trained on the adversarial dataset generated by any kind of adversarial attack.


2021 ◽  
Author(s):  
Yidong Chai ◽  
Ruicheng Liang ◽  
Hongyi Zhu ◽  
Sagar Samtani ◽  
Meng Wang ◽  
...  

Deep learning models have significantly advanced various natural language processing tasks. However, they are strikingly vulnerable to adversarial text attacks, even in the black-box setting where no model knowledge is accessible to hackers. Such attacks are conducted with a two-phase framework: 1) a sensitivity estimation phase to evaluate each element’s sensitivity to the target model’s prediction, and 2) a perturbation execution phase to craft the adversarial examples based on estimated element sensitivity. This study explored the connections between the local post-hoc explainable methods for deep learning and black-box adversarial text attacks and proposed a novel eXplanation-based method for crafting Adversarial Text Attacks (XATA). XATA leverages local post-hoc explainable methods (e.g., LIME or SHAP) to measure input elements’ sensitivity and adopts the word replacement perturbation strategy to craft adversarial examples. We evaluated the attack performance of the proposed XATA on three commonly used text-based datasets: IMDB Movie Review, Yelp Reviews-Polarity, and Amazon Reviews-Polarity. The proposed XATA outperformed existing baselines in various target models, including LSTM, GRU, CNN, and BERT. Moreover, we found that improved local post-hoc explainable methods (e.g., SHAP) lead to more effective adversarial attacks. These findings showed that when researchers constantly advance the explainability of deep learning models with local post-hoc methods, they also provide hackers with weapons to craft more targeted and dangerous adversarial attacks.


Author(s):  
Bangjie Yin ◽  
Wenxuan Wang ◽  
Taiping Yao ◽  
Junfeng Guo ◽  
Zelun Kong ◽  
...  

Deep neural networks, particularly face recognition models, have been shown to be vulnerable to both digital and physical adversarial examples. However, existing adversarial examples against face recognition systems either lack transferability to black-box models, or fail to be implemented in practice. In this paper, we propose a unified adversarial face generation method - Adv-Makeup, which can realize imperceptible and transferable attack under the black-box setting. Adv-Makeup develops a task-driven makeup generation method with the blending module to synthesize imperceptible eye shadow over the orbital region on faces. And to achieve transferability, Adv-Makeup implements a fine-grained meta-learning based adversarial attack strategy to learn more vulnerable or sensitive features from various models. Compared to existing techniques, sufficient visualization results demonstrate that Adv-Makeup is capable to generate much more imperceptible attacks under both digital and physical scenarios. Meanwhile, extensive quantitative experiments show that Adv-Makeup can significantly improve the attack success rate under black-box setting, even attacking commercial systems.


Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7158
Author(s):  
Naufal Suryanto ◽  
Hyoeun Kang ◽  
Yongsu Kim ◽  
Youngyeo Yun ◽  
Harashta Tatimma Larasati ◽  
...  

Adversarial attack techniques in deep learning have been studied extensively due to its stealthiness to human eyes and potentially dangerous consequences when applied to real-life applications. However, current attack methods in black-box settings mainly employ a large number of queries for crafting their adversarial examples, hence making them very likely to be detected and responded by the target system (e.g., artificial intelligence (AI) service provider) due to its high traffic volume. A recent proposal able to address the large query problem utilizes a gradient-free approach based on Particle Swarm Optimization (PSO) algorithm. Unfortunately, this original approach tends to have a low attack success rate, possibly due to the model’s difficulty of escaping local optima. This obstacle can be overcome by employing a multi-group approach for PSO algorithm, by which the PSO particles can be redistributed, preventing them from being trapped in local optima. In this paper, we present a black-box adversarial attack which can significantly increase the success rate of PSO-based attack while maintaining a low number of query by launching the attack in a distributed manner. Attacks are executed from multiple nodes, disseminating queries among the nodes, hence reducing the possibility of being recognized by the target system while also increasing scalability. Furthermore, we utilize Multi-Group PSO with Random Redistribution (MGRR-PSO) for perturbation generation, performing better than the original approach against local optima, thus achieving a higher success rate. Additionally, we propose to efficiently remove excessive perturbation (i.e, perturbation pruning) by utilizing again the MGRR-PSO rather than a standard iterative method as used in the original approach. We perform five different experiments: comparing our attack’s performance with existing algorithms, testing in high-dimensional space in ImageNet dataset, examining our hyperparameters (i.e., particle size, number of clients, search boundary), and testing on real digital attack to Google Cloud Vision. Our attack proves to obtain a 100% success rate on MNIST and CIFAR-10 datasets and able to successfully fool Google Cloud Vision as a proof of the real digital attack by maintaining a lower query and wide applicability.


2021 ◽  
pp. 1-11
Author(s):  
Tianshi Mu ◽  
Kequan Lin ◽  
Huabing Zhang ◽  
Jian Wang

Deep learning is gaining significant traction in a wide range of areas. Whereas, recent studies have demonstrated that deep learning exhibits the fatal weakness on adversarial examples. Due to the black-box nature and un-transparency problem of deep learning, it is difficult to explain the reason for the existence of adversarial examples and also hard to defend against them. This study focuses on improving the adversarial robustness of convolutional neural networks. We first explore how adversarial examples behave inside the network through visualization. We find that adversarial examples produce perturbations in hidden activations, which forms an amplification effect to fool the network. Motivated by this observation, we propose an approach, termed as sanitizing hidden activations, to help the network correctly recognize adversarial examples by eliminating or reducing the perturbations in hidden activations. To demonstrate the effectiveness of our approach, we conduct experiments on three widely used datasets: MNIST, CIFAR-10 and ImageNet, and also compare with state-of-the-art defense techniques. The experimental results show that our sanitizing approach is more generalized to defend against different kinds of attacks and can effectively improve the adversarial robustness of convolutional neural networks.


Author(s):  
Zihan Zhang ◽  
Mingxuan Liu ◽  
Chao Zhang ◽  
Yiming Zhang ◽  
Zhou Li ◽  
...  

Natural language processing (NLP) models are known vulnerable to adversarial examples, similar to image processing models. Studying adversarial texts is an essential step to improve the robustness of NLP models. However, existing studies mainly focus on analyzing English texts and generating adversarial examples for English texts. There is no work studying the possibility and effect of the transformation to another language, e.g, Chinese. In this paper, we analyze the differences between Chinese and English, and explore the methodology to transform the existing English adversarial generation method to Chinese. We propose a novel black-box adversarial Chinese texts generation solution Argot, by utilizing the method for adversarial English samples and several novel methods developed on Chinese characteristics. Argot could effectively and efficiently generate adversarial Chinese texts with good readability. Furthermore, Argot could also automatically generate targeted Chinese adversarial text, achieving a high success rate and ensuring readability of the Chinese.


Sign in / Sign up

Export Citation Format

Share Document