No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks

Murad Tukan; Alaa Maalouf; Matan Weksler; Dan Feldman

doi:10.3390/s21165599

No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks

Sensors ◽

10.3390/s21165599 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5599

Author(s):

Murad Tukan ◽

Alaa Maalouf ◽

Matan Weksler ◽

Dan Feldman

Keyword(s):

Main Idea ◽

Fine Tuning ◽

Original Training ◽

Training Set ◽

Technical Result ◽

Compression Time ◽

The Matrix ◽

Common Technique ◽

Theoretical Advantage ◽

Fully Connected

A common technique for compressing a neural network is to compute the k-rank ℓ2 approximation Ak of the matrix A∈Rn×d via SVD that corresponds to a fully connected layer (or embedding layer). Here, d is the number of input neurons in the layer, n is the number in the next one, and Ak is stored in O((n+d)k) memory instead of O(nd). Then, a fine-tuning step is used to improve this initial compression. However, end users may not have the required computation resources, time, or budget to run this fine-tuning stage. Furthermore, the original training set may not be available. In this paper, we provide an algorithm for compressing neural networks using a similar initial compression time (to common techniques) but without the fine-tuning step. The main idea is replacing the k-rank ℓ2 approximation with ℓp, for p∈[1,2], which is known to be less sensitive to outliers but much harder to compute. Our main technical result is a practical and provable approximation algorithm to compute it for any p≥1, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing the networks BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage.

Download Full-text

Protect privacy of deep classification networks by exploiting their generative power

Machine Learning ◽

10.1007/s10994-021-05951-6 ◽

2021 ◽

Author(s):

Jiyu Chen ◽

Yiwen Guo ◽

Qianjun Zheng ◽

Hao Chen

Keyword(s):

Fine Tuning ◽

Original Training ◽

Training Set ◽

Case Scenario ◽

Worst Case ◽

Generative Power ◽

Inference Attacks ◽

Fine Tune ◽

Learning Schemes ◽

New Framework

AbstractResearch showed that deep learning models are vulnerable to membership inference attacks, which aim to determine if an example is in the training set of the model. We propose a new framework to defend against this sort of attack. Our key insight is that if we retrain the original classifier with a new dataset that is independent of the original training set while their elements are sampled from the same distribution, the retrained classifier will leak no information that cannot be inferred from the distribution about the original training set. Our framework consists of three phases. First, we transferred the original classifier to a Joint Energy-based Model (JEM) to exploit the model’s implicit generative power. Then, we sampled from the JEM to create a new dataset. Finally, we used the new dataset to retrain or fine-tune the original classifier. We empirically studied different transfer learning schemes for the JEM and fine-tuning/retraining strategies for the classifier against shadow-model attacks. Our evaluation shows that our framework can suppress the attacker’s membership advantage to a negligible level while keeping the classifier’s accuracy acceptable. We compared it with other state-of-the-art defenses considering adaptive attackers and showed our defense is effective even under the worst-case scenario. Besides, we also found that combining other defenses with our framework often achieves better robustness. Our code will be made available at https://github.com/ChenJiyu/meminf-defense.git.

Download Full-text

Image-based textile decoding

Integrated Computer-Aided Engineering ◽

10.3233/ica-200647 ◽

2020 ◽

pp. 1-14

Author(s):

Siqiang Chen ◽

Masahiro Toyoura ◽

Takamasa Terada ◽

Xiaoyang Mao ◽

Gang Xu

Keyword(s):

Deep Learning ◽

Grid Point ◽

Intermediate Representation ◽

Training Set ◽

Binary Matrix ◽

The Matrix ◽

Correct Pattern ◽

Grid Points ◽

Textile Fabric ◽

Direct Correspondence

A textile fabric consists of countless parallel vertical yarns (warps) and horizontal yarns (wefts). While common looms can weave repetitive patterns, Jacquard looms can weave the patterns without repetition restrictions. A pattern in which the warps and wefts cross on a grid is defined in a binary matrix. The binary matrix can define which warp and weft is on top at each grid point of the Jacquard fabric. The process can be regarded as encoding from pattern to textile. In this work, we propose a decoding method that generates a binary pattern from a textile fabric that has been already woven. We could not use a deep neural network to learn the process based solely on the training set of patterns and observed fabric images. The crossing points in the observed image were not completely located on the grid points, so it was difficult to take a direct correspondence between the fabric images and the pattern represented by the matrix in the framework of deep learning. Therefore, we propose a method that can apply the framework of deep learning viau the intermediate representation of patterns and images. We show how to convert a pattern into an intermediate representation and how to reconvert the output into a pattern and confirm its effectiveness. In this experiment, we confirmed that 93% of correct pattern was obtained by decoding the pattern from the actual fabric images and weaving them again.

Download Full-text

A New Method for Emulating Self-Organizing Maps for Visualization of Datasets

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026818500141 ◽

2018 ◽

Vol 17 (03) ◽

pp. 1850014 ◽

Cited By ~ 1

Author(s):

Macario O. Cordel ◽

Arnulfo P. Azcarraga

Keyword(s):

Disaster Response ◽

Dimensional Space ◽

Main Idea ◽

Cost Effective ◽

Fine Tuning ◽

Visualization Tool ◽

Self Organizing Map ◽

Text Complexity ◽

Self Organizing Maps ◽

Self Organizing

Several time-critical problems relying on large amount of data, e.g., business trends, disaster response and disease outbreak, require cost-effective, timely and accurate data summary and visualization, in order to come up with an efficient and effective decision. Self-organizing map (SOM) is a very effective data clustering and visualization tool as it provides intuitive display of data in lower-dimensional space. However, with [Formula: see text] complexity, SOM becomes inappropriate for large datasets. In this paper, we propose a force-directed visualization method that emulates SOMs capability to display the data clusters with [Formula: see text] complexity. The main idea is to perform a force-directed fine-tuning of the 2D representation of data. To demonstrate the efficiency and the vast potential of the proposed method as a fast visualization tool, the methodology is used to do a 2D-projection of the MNIST handwritten digits dataset.

Download Full-text

Lifelong Zero-Shot Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/77 ◽

2020 ◽

Author(s):

Kun Wei ◽

Cheng Deng ◽

Xu Yang

Keyword(s):

Fine Tuning ◽

Training Set ◽

Continuous Training ◽

Evaluation Protocol ◽

Multiple Datasets ◽

Real World Applications ◽

Knowledge Distillation ◽

Novel Method ◽

Previous Training ◽

Current Stage

Zero-Shot Learning (ZSL) handles the problem that some testing classes never appear in training set. Existing ZSL methods are designed for learning from a fixed training set, which do not have the ability to capture and accumulate the knowledge of multiple training sets, causing them infeasible to many real-world applications. In this paper, we propose a new ZSL setting, named as Lifelong Zero-Shot Learning (LZSL), which aims to accumulate the knowledge during the learning from multiple datasets and recognize unseen classes of all trained datasets. Besides, a novel method is conducted to realize LZSL, which effectively alleviates the Catastrophic Forgetting in the continuous training process. Specifically, considering those datasets containing different semantic embeddings, we utilize Variational Auto-Encoder to obtain unified semantic representations. Then, we leverage selective retraining strategy to preserve the trained weights of previous tasks and avoid negative transfer when fine-tuning the entire model. Finally, knowledge distillation is employed to transfer knowledge from previous training stages to current stage. We also design the LZSL evaluation protocol and the challenging benchmarks. Extensive experiments on these benchmarks indicate that our method tackles LZSL problem effectively, while existing ZSL methods fail.

Download Full-text

A Study of the Influence of Data Complexity and Similarity on Soft Biometrics Classification Performance in a Transfer Learning Scenario

Learning and Nonlinear Models ◽

10.21528/lnlm-vol18-no2-art5 ◽

2021 ◽

Vol 18 (2) ◽

pp. 56-65

Author(s):

Marcelo Romero ◽

◽

Matheus Gutoski ◽

Leandro Takeshi Hattori ◽

Manassés Ribeiro ◽

...

Keyword(s):

Transfer Learning ◽

Classification Performance ◽

Fine Tuning ◽

Similarity Metrics ◽

Soft Biometrics ◽

Final Layer ◽

The One ◽

Fully Connected ◽

The Relationship ◽

Fine Tune

Transfer learning is a paradigm that consists in training and testing classifiers with datasets drawn from distinct distributions. This technique allows to solve a particular problem using a model that was trained for another purpose. In the recent years, this practice has become very popular due to the increase of public available pre-trained models that can be fine-tuned to be applied in different scenarios. However, the relationship between the datasets used for training the model and the test data is usually not addressed, specially where the fine-tuning process is done only for the fully connected layers of a Convolutional Neural Network with pre-trained weights. This work presents a study regarding the relationship between the datasets used in a transfer learning process in terms of the performance achieved by models complexities and similarities. For this purpose, we fine-tune the final layer of Convolutional Neural Networks with pre-trained weights using diverse soft biometrics datasets. An evaluation of the performances of the models, when tested with datasets that are different from the one used for training the model, is presented. Complexity and similarity metrics are also used to perform the evaluation.

Download Full-text

Superpixel-Guided Layer-Wise Embedding CNN for Remote Sensing Image Classification

Remote Sensing ◽

10.3390/rs11020174 ◽

2019 ◽

Vol 11 (2) ◽

pp. 174 ◽

Cited By ~ 4

Author(s):

Han Liu ◽

Jun Li ◽

Lin He ◽

Yu Wang

Keyword(s):

Remote Sensing ◽

Image Classification ◽

Remote Sensing Data ◽

Sampling Strategy ◽

Remote Sensing Image ◽

Fine Tuning ◽

Spatial Dependency ◽

Remote Sensing Images ◽

Training Set ◽

Remote Sensing Image Classification

Irregular spatial dependency is one of the major characteristics of remote sensing images, which brings about challenges for classification tasks. Deep supervised models such as convolutional neural networks (CNNs) have shown great capacity for remote sensing image classification. However, they generally require a huge labeled training set for the fine tuning of a deep neural network. To handle the irregular spatial dependency of remote sensing images and mitigate the conflict between limited labeled samples and training demand, we design a superpixel-guided layer-wise embedding CNN (SLE-CNN) for remote sensing image classification, which can efficiently exploit the information from both labeled and unlabeled samples. With the superpixel-guided sampling strategy for unlabeled samples, we can achieve an automatic determination of the neighborhood covering for a spatial dependency system and thus adapting to real scenes of remote sensing images. In the designed network, two types of loss costs are combined for the training of CNN, i.e., supervised cross entropy and unsupervised reconstruction cost on both labeled and unlabeled samples, respectively. Our experimental results are conducted with three types of remote sensing data, including hyperspectral, multispectral, and synthetic aperture radar (SAR) images. The designed SLE-CNN achieves excellent classification performance in all cases with a limited labeled training set, suggesting its good potential for remote sensing image classification.

Download Full-text

How close is our ΛCDM physical universe to de Sitter spacetime dS4?

International Journal of Modern Physics D ◽

10.1142/s0218271820430324 ◽

2020 ◽

Vol 29 (14) ◽

pp. 2043032

Author(s):

Arthur E. Fischer

Keyword(s):

Error Function ◽

Main Idea ◽

Scale Factor ◽

Fine Tuning ◽

De Sitter Spacetime ◽

De Sitter ◽

Physical Universe ◽

Scale Factors ◽

Sitter Spacetime ◽

Physical Formula

We introduce a methodology for quantitatively measuring at all times in its evolution how close our physical spatially flat [Formula: see text] CDM universe with cosmological constant [Formula: see text] is to the de Sitter spacetime [Formula: see text] with de Sitter radius [Formula: see text]. The main idea in this study is to align the respective scale factors [Formula: see text] and [Formula: see text] of these two spacetimes, where de Sitter spacetime is taken with respect to a spatially flat foliation. This goal is accomplished by fine-tuning an adjustable parameter [Formula: see text] that arises naturally in the de Sitter scale factor by requiring that these scale factors be future-asymptotically convergent. Once this parameter is adjusted and the scale factors are aligned, we define a relative error function [Formula: see text] that computes as a function of time [Formula: see text] how close the scale factors of these two spacetimes are to one another. Our results quantify how close our physical [Formula: see text]CDM universe is to its corresponding de Sitter spacetime as both spacetimes converge as they expand. As an example of our results, we show that at the present time [Formula: see text][Formula: see text]Gy, to an accuracy of [Formula: see text], and at [Formula: see text][Formula: see text]Gy, to an accuracy of [Formula: see text], we can use de Sitter spacetime to model our own [Formula: see text]CDM universe. Our results also show by statistical analysis that with a confidence level of 68.3%, for [Formula: see text][Formula: see text]Gy, the scale factor [Formula: see text] of our [Formula: see text] universe and the scale factor [Formula: see text] of the corresponding de Sitter spacetime are indistinguishable to within the accuracy of current cosmological measurements.

Download Full-text

Amorphous Calcium Phosphate Based Polymeric Composites: Effects of Polymer Composition and Filler's Particle Size on Composite Properties

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.284-286.737 ◽

2005 ◽

Vol 284-286 ◽

pp. 737-740 ◽

Cited By ~ 4

Author(s):

Drago Skrtic ◽

S.Y. Lee ◽

Joseph M. Antonucci ◽

D.W. Liu

Keyword(s):

Particle Size ◽

Calcium Phosphate ◽

Physicochemical Properties ◽

Amorphous Calcium Phosphate ◽

Fine Tuning ◽

Polymer Composition ◽

Ion Release ◽

The Matrix ◽

Carboxylic Groups ◽

Composite Properties

This study explores how a) the resin grafting potential for amorphous calcium phosphate (ACP) and b) particle size of ACP affects physicochemical properties of composites. Copolymers and composites were evaluated for biaxial flexure strength (BFS), degree of vinyl conversion (DC), mineral ion release and water sorption (WS). Milled ACP composites were superior to unmilled ACP composites and exhibited 62 % and 77 % higher BFS values (dry and wet state, respectively). The average DC of copolymers 24 h after curing was 80 %. DC of composites decreased 10.3 % for unmilled Zr-ACP and 4.6 % for milled Zr-ACP when compared to the corresponding copolymers. The WS increased as follows: copolymers < milled Zr-ACP composites < unmilled Zr-ACP composites. The levels of Ca and PO4 released from both types of composites increased with the increasing EBPADMA/TEGDMA ratio in the matrix. They were significantly above the minimum necessary for the redeposition of HAP to occur. No significant consumption of released calcium by the carboxylic groups of methacryloxyethyl phtahalate (MEP) occurred at a mass fraction of 2.6 % of MEP in the resin. Improvements in ACP composite’s physicochemical properties are achieved by fine tuning of the resin and improved ACP’s dispersion within the polymer matrix after ball-milling.

Download Full-text

Robot Obstacle Avoidance Learning Based on Mixture Models

Journal of Robotics ◽

10.1155/2016/7840580 ◽

2016 ◽

Vol 2016 ◽

pp. 1-14 ◽

Cited By ~ 6

Author(s):

Huiwen Zhang ◽

Xiaoning Han ◽

Mingliang Fu ◽

Weijia Zhou

Keyword(s):

Obstacle Avoidance ◽

Avoidance Learning ◽

Dynamic Environment ◽

Main Idea ◽

Control Policy ◽

Gaussian Mixture ◽

Avoidance Task ◽

Human Beings ◽

Training Set ◽

Mixture Regression

We briefly surveyed the existing obstacle avoidance algorithms; then a new obstacle avoidance learning framework based on learning from demonstration (LfD) is proposed. The main idea is to imitate the obstacle avoidance mechanism of human beings, in which humans learn to make a decision based on the sensor information obtained by interacting with environment. Firstly, we endow robots with obstacle avoidance experience by teaching them to avoid obstacles in different situations. In this process, a lot of data are collected as a training set; then, to encode the training set data, which is equivalent to extracting the constraints of the task, Gaussian mixture model (GMM) is used. Secondly, a smooth obstacle-free path is generated by Gaussian mixture regression (GMR). Thirdly, a metric of imitation performance is constructed to derive a proper control policy. The proposed framework shows excellent generalization performance, which means that the robots can fulfill obstacle avoidance task efficiently in a dynamic environment. More importantly, the framework allows learning a wide variety of skills, such as grasp and manipulation work, which makes it possible to build a robot with versatile functions. Finally, simulation experiments are conducted on a Turtlebot robot to verify the validity of our algorithms.

Download Full-text

Background Modeling Approach Based on Eigenspace Decomposition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.157-158.1399 ◽

2012 ◽

Vol 157-158 ◽

pp. 1399-1403

Author(s):

Jian Wu Long ◽

Xuan Jing Shen ◽

Hai Peng Chen

Keyword(s):

Video Sequence ◽

Moving Objects ◽

Principal Component ◽

Background Model ◽

Training Set ◽

Training Samples ◽

The Matrix ◽

Eigenspace Decomposition ◽

Svd Decomposition ◽

Background Models

In this work principal component analysis (PCA) was adopted to construct a background model and moving objects were detected by background subtraction method. Firstly, constructed the matrix of training samples by means of converting the video sequence to vectors. Then calculated the covariance matrix C of the training set, and acquired the eigenvalues and eigenvectors of C through SVD decomposition. Next, sorted the eigenvalues and reconstructed the background model by using several image vectors which had higher cumulative contribution. Finally, comparison experiments are performed with the detection results by GMM approach. Experimental results show that the proposed method in this paper could establish background models more accurate and have better effective of object detection.

Download Full-text