Robust Neural Networks are More Interpretable for Genomics

Mapping Intimacies ◽

10.1101/657437 ◽

2019 ◽

Cited By ~ 5

Author(s):

Peter K. Koo ◽

Sharon Qian ◽

Gal Kaplun ◽

Verena Volf ◽

Dimitris Kalimeris

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Random Noise ◽

Genomic Data ◽

Training Methods ◽

Generalization Performance ◽

Regulatory Genomics ◽

Adversarial Training

AbstractDeep neural networks (DNNs) have been applied to a variety of regulatory genomics tasks. For interpretability, attribution methods are employed to provide importance scores for each nucleotide in a given sequence. However, even with state-of-the-art DNNs, there is no guarantee that these methods can recover interpretable, biological representations. Here we perform systematic experiments on synthetic genomic data to raise awareness of this issue. We find that deeper networks have better generalization performance, but attribution methods recover less interpretable representations. Then, we show training methods promoting robustness – including regularization, injecting random noise into the data, and adversarial training – significantly improve interpretability of DNNs, especially for smaller datasets.

Download Full-text

Invariant Representations through Adversarial Forgetting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5850 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4272-4279

Author(s):

Ayush Jaiswal ◽

Daniel Moyer ◽

Greg Ver Steeg ◽

Wael AbdAlmageed ◽

Premkumar Natarajan

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Empirical Results ◽

Information Bottleneck ◽

Novel Approach ◽

Adversarial Training ◽

Invariant Representations ◽

Art Performance ◽

Forgetting Mechanism

We propose a novel approach to achieving invariance for deep neural networks in the form of inducing amnesia to unwanted factors of data through a new adversarial forgetting mechanism. We show that the forgetting mechanism serves as an information-bottleneck, which is manipulated by the adversarial training to learn invariance to unwanted factors. Empirical results show that the proposed framework achieves state-of-the-art performance at learning invariance in both nuisance and bias settings on a diverse collection of datasets and tasks.

Download Full-text

Learning robust features by extended generative stochastic networks

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962318500046 ◽

2018 ◽

Vol 09 (01) ◽

pp. 1850004

Author(s):

Da Teng ◽

Xiao Song ◽

Guanghong Gong ◽

Junhua Zhou

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Deep Neural Networks ◽

State Of The Art ◽

Random Noise ◽

Stochastic Networks ◽

Experimental Results ◽

Feedforward Networks ◽

Adversarial Examples ◽

Art Performance

Deep neural networks have achieved state-of-the-art performance on many object recognition tasks, but they are vulnerable to small adversarial perturbations. In this paper, several extensions of generative stochastic networks (GSNs) are proposed to improve the robustness of neural networks to random noise and adversarial perturbations. Experimental results show that compared to normal GSN method, the extensions using adversarial examples, lateral connections and feedforward networks can improve the performance of GSNs by making the models more resistant to overfitting and noise.

Download Full-text

Adversarially Robust Distillation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5816 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3996-4003

Author(s):

Micah Goldblum ◽

Liam Fowl ◽

Soheil Feizi ◽

Tom Goldstein

Keyword(s):

Neural Networks ◽

High Performance ◽

State Of The Art ◽

Test Accuracy ◽

Training Methods ◽

Knowledge Distillation ◽

Adversarial Training ◽

Student Models ◽

Small Models ◽

High Test

Knowledge distillation is effective for producing small, high-performance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.

Download Full-text

Diversity Adversarial Training against Adversarial Attack on Deep Neural Networks

Symmetry ◽

10.3390/sym13030428 ◽

2021 ◽

Vol 13 (3) ◽

pp. 428

Author(s):

Hyun Kwon ◽

Jun Lee

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Diversity Training ◽

Original Data ◽

Training Method ◽

Learning Framework ◽

Adversarial Examples ◽

Adversarial Training ◽

Adversarial Attack ◽

Accuracy Rates

This paper presents research focusing on visualization and pattern recognition based on computer science. Although deep neural networks demonstrate satisfactory performance regarding image and voice recognition, as well as pattern analysis and intrusion detection, they exhibit inferior performance towards adversarial examples. Noise introduction, to some degree, to the original data could lead adversarial examples to be misclassified by deep neural networks, even though they can still be deemed as normal by humans. In this paper, a robust diversity adversarial training method against adversarial attacks was demonstrated. In this approach, the target model is more robust to unknown adversarial examples, as it trains various adversarial samples. During the experiment, Tensorflow was employed as our deep learning framework, while MNIST and Fashion-MNIST were used as experimental datasets. Results revealed that the diversity training method has lowered the attack success rate by an average of 27.2 and 24.3% for various adversarial examples, while maintaining the 98.7 and 91.5% accuracy rates regarding the original data of MNIST and Fashion-MNIST.

Download Full-text

Solving inverse problems in stochastic models using deep neural networks and adversarial training

Computer Methods in Applied Mechanics and Engineering ◽

10.1016/j.cma.2021.113976 ◽

2021 ◽

Vol 384 ◽

pp. 113976

Author(s):

Kailai Xu ◽

Eric Darve

Keyword(s):

Neural Networks ◽

Inverse Problems ◽

Stochastic Models ◽

Deep Neural Networks ◽

Adversarial Training

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme

Electronics ◽

10.3390/electronics10030230 ◽

2021 ◽

Vol 10 (3) ◽

pp. 230

Author(s):

Jaechan Cho ◽

Yongchul Jung ◽

Seongjoo Lee ◽

Yunho Jung

Keyword(s):

Neural Networks ◽

High Throughput ◽

Deep Neural Networks ◽

State Of The Art ◽

Throughput Performance ◽

Adaptive Parallelism ◽

Sensor Applications ◽

Binary Neural Network ◽

Target Layer ◽

Network Topologies

Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs.

Download Full-text

Label Distribution for Learning with Noisy Labels

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/356 ◽

2020 ◽

Author(s):

Yun-Peng Liu ◽

Ning Xu ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Learning Algorithm ◽

State Of The Art ◽

Confidence Estimation ◽

Novel Method ◽

Real World Datasets ◽

Label Distribution ◽

Noisy Labels

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.

Download Full-text

Framework for TCAD augmented machine learning on multi- I–V characteristics using convolutional neural network and multiprocessing

Journal of Semiconductors ◽

10.1088/1674-4926/42/12/124101 ◽

2021 ◽

Vol 42 (12) ◽

pp. 124101

Author(s):

Thomas Hirtz ◽

Steyn Huurman ◽

He Tian ◽

Yi Yang ◽

Tian-Ling Ren

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Information Technologies ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Driven ◽

Sufficient Data ◽

Learning Models ◽

Simulation Tools ◽

New Information

Abstract In a world where data is increasingly important for making breakthroughs, microelectronics is a field where data is sparse and hard to acquire. Only a few entities have the infrastructure that is required to automate the fabrication and testing of semiconductor devices. This infrastructure is crucial for generating sufficient data for the use of new information technologies. This situation generates a cleavage between most of the researchers and the industry. To address this issue, this paper will introduce a widely applicable approach for creating custom datasets using simulation tools and parallel computing. The multi-I–V curves that we obtained were processed simultaneously using convolutional neural networks, which gave us the ability to predict a full set of device characteristics with a single inference. We prove the potential of this approach through two concrete examples of useful deep learning models that were trained using the generated data. We believe that this work can act as a bridge between the state-of-the-art of data-driven methods and more classical semiconductor research, such as device engineering, yield engineering or process monitoring. Moreover, this research gives the opportunity to anybody to start experimenting with deep neural networks and machine learning in the field of microelectronics, without the need for expensive experimentation infrastructure.

Download Full-text

Domain Generalization Using a Mixture of Multiple Latent Domains

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6846 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11749-11756 ◽

Cited By ~ 2

Author(s):

Toshihiko Matsuura ◽

Tatsuya Harada

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Multiple Source ◽

Web Crawling ◽

Not Given ◽

Adversarial Learning ◽

Generalized Model ◽

Generalization Performance ◽

Invariant Feature ◽

Feature Extractor

When domains, which represent underlying data distributions, vary during training and testing processes, deep neural networks suffer a drop in their performance. Domain generalization allows improvements in the generalization performance for unseen target domains by using multiple source domains. Conventional methods assume that the domain to which each sample belongs is known in training. However, many datasets, such as those collected via web crawling, contain a mixture of multiple latent domains, in which the domain of each sample is unknown. This paper introduces domain generalization using a mixture of multiple latent domains as a novel and more realistic scenario, where we try to train a domain-generalized model without using domain labels. To address this scenario, we propose a method that iteratively divides samples into latent domains via clustering, and which trains the domain-invariant feature extractor shared among the divided latent domains via adversarial learning. We assume that the latent domain of images is reflected in their style, and thus, utilize style features for clustering. By using these features, our proposed method successfully discovers latent domains and achieves domain generalization even if the domain labels are not given. Experiments show that our proposed method can train a domain-generalized model without using domain labels. Moreover, it outperforms conventional domain generalization methods, including those that utilize domain labels.

Download Full-text