Semi-Supervised Multi-Temporal Deep Representation Fusion Network for Landslide Mapping from Aerial Orthophotos

Xiaokang Zhang; Man-On Pun; Ming Liu

doi:10.3390/rs13040548

Semi-Supervised Multi-Temporal Deep Representation Fusion Network for Landslide Mapping from Aerial Orthophotos

Remote Sensing ◽

10.3390/rs13040548 ◽

2021 ◽

Vol 13 (4) ◽

pp. 548

Author(s):

Xiaokang Zhang ◽

Man-On Pun ◽

Ming Liu

Keyword(s):

Feature Space ◽

Representation Learning ◽

Feature Representation ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Multi Temporal ◽

Landslide Mapping ◽

Spatio Temporal ◽

Reliability And Robustness ◽

High Level

Using remote sensing techniques to monitor landslides and their resultant land cover changes is fundamentally important for risk assessment and hazard prevention. Despite enormous efforts in developing intelligent landslide mapping (LM) approaches, LM remains challenging owing to high spectral heterogeneity of very-high-resolution (VHR) images and the daunting labeling efforts. To this end, a deep learning model based on semi-supervised multi-temporal deep representation fusion network, namely SMDRF-Net, is proposed for reliable and efficient LM. In comparison with previous methods, the SMDRF-Net possesses three distinct properties. (1) Unsupervised deep representation learning at the pixel- and object-level is performed by transfer learning using the Wasserstein generative adversarial network with gradient penalty to learn discriminative deep features and retain precise outlines of landslide objects in the high-level feature space. (2) Attention-based adaptive fusion of multi-temporal and multi-level deep representations is developed to exploit the spatio-temporal dependencies of deep representations and enhance the feature representation capability of the network. (3) The network is optimized using limited samples with pseudo-labels that are automatically generated based on a comprehensive uncertainty index. Experimental results from the analysis of VHR aerial orthophotos demonstrate the reliability and robustness of the proposed approach for LM in comparison with state-of-the-art methods.

Download Full-text

End-to-end deep image reconstruction from human brain activity

10.1101/272518 ◽

2018 ◽

Cited By ~ 4

Author(s):

Guohua Shen ◽

Kshitij Dwivedi ◽

Kei Majima ◽

Tomoyasu Horikawa ◽

Yukiyasu Kamitani

Keyword(s):

Image Reconstruction ◽

Brain Activity ◽

Critical Role ◽

Feature Space ◽

Training Data ◽

Fmri Data ◽

Generative Adversarial Network ◽

Adversarial Network ◽

End To End ◽

High Level

AbstractDeep neural networks (DNNs) have recently been applied successfully to brain decoding and image reconstruction from functional magnetic resonance imaging (fMRI) activity. However, direct training of a DNN with fMRI data is often avoided because the size of available data is thought to be insufficient to train a complex network with numerous parameters. Instead, a pre-trained DNN has served as a proxy for hierarchical visual representations, and fMRI data were used to decode individual DNN features of a stimulus image using a simple linear model, which were then passed to a reconstruction module. Here, we present our attempt to directly train a DNN model with fMRI data and the corresponding stimulus images to build an end-to-end reconstruction model. We trained a generative adversarial network with an additional loss term defined in a high-level feature space (feature loss) using up to 6,000 training data points (natural images and the fMRI responses). The trained deep generator network was tested on an independent dataset, directly producing a reconstructed image given an fMRI pattern as the input. The reconstructions obtained from the proposed method showed resemblance with both natural and artificial test stimuli. The accuracy increased as a function of the training data size, though not outperforming the decoded feature-based method with the available data size. Ablation analyses indicated that the feature loss played a critical role to achieve accurate reconstruction. Our results suggest a potential for the end-to-end framework to learn a direct mapping between brain activity and perception given even larger datasets.

Download Full-text

Spatio-Temporal Learning for Video Deblurring based on Two-Stream Generative Adversarial Network

Neural Processing Letters ◽

10.1007/s11063-021-10520-y ◽

2021 ◽

Author(s):

Liyao Song ◽

Quan Wang ◽

Haiwei Li ◽

Jiancun Fan ◽

Bingliang Hu

Keyword(s):

Generative Adversarial Network ◽

Adversarial Network ◽

Spatio Temporal ◽

Temporal Learning

Download Full-text

TWIST-GAN: Towards Wavelet Transform and Transferred GAN for Spatio-Temporal Single Image Super Resolution

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3456726 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-20

Author(s):

Fayaz Ali Dharejo ◽

Farah Deeba ◽

Yuanchun Zhou ◽

Bhagwan Das ◽

Munsif Ali Jatoi ◽

...

Keyword(s):

Remote Sensing ◽

Super Resolution ◽

Generative Adversarial Networks ◽

Single Image ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Image Super Resolution ◽

Spatio Temporal ◽

Single Image Super Resolution

Single Image Super-resolution (SISR) produces high-resolution images with fine spatial resolutions from a remotely sensed image with low spatial resolution. Recently, deep learning and generative adversarial networks (GANs) have made breakthroughs for the challenging task of single image super-resolution (SISR) . However, the generated image still suffers from undesirable artifacts such as the absence of texture-feature representation and high-frequency information. We propose a frequency domain-based spatio-temporal remote sensing single image super-resolution technique to reconstruct the HR image combined with generative adversarial networks (GANs) on various frequency bands (TWIST-GAN). We have introduced a new method incorporating Wavelet Transform (WT) characteristics and transferred generative adversarial network. The LR image has been split into various frequency bands by using the WT, whereas the transfer generative adversarial network predicts high-frequency components via a proposed architecture. Finally, the inverse transfer of wavelets produces a reconstructed image with super-resolution. The model is first trained on an external DIV2 K dataset and validated with the UC Merced Landsat remote sensing dataset and Set14 with each image size of 256 × 256. Following that, transferred GANs are used to process spatio-temporal remote sensing images in order to minimize computation cost differences and improve texture information. The findings are compared qualitatively and qualitatively with the current state-of-art approaches. In addition, we saved about 43% of the GPU memory during training and accelerated the execution of our simplified version by eliminating batch normalization layers.

Download Full-text

IntroVNMT: An Introspective Model for Variational Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6411 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8830-8837

Author(s):

Xin Sheng ◽

Linli Xu ◽

Junliang Guo ◽

Jingchang Liu ◽

Ruoyu Zhao ◽

...

Keyword(s):

Machine Translation ◽

Latent Variables ◽

Image Synthesis ◽

Target Language ◽

Generative Adversarial Network ◽

Neural Machine Translation ◽

Adversarial Network ◽

Proposed Model ◽

Model Training ◽

High Level

We propose a novel introspective model for variational neural machine translation (IntroVNMT) in this paper, inspired by the recent successful application of introspective variational autoencoder (IntroVAE) in high quality image synthesis. Different from the vanilla variational NMT model, IntroVNMT is capable of improving itself introspectively by evaluating the quality of the generated target sentences according to the high-level latent variables of the real and generated target sentences. As a consequence of introspective training, the proposed model is able to discriminate between the generated and real sentences of the target language via the latent variables generated by the encoder of the model. In this way, IntroVNMT is able to generate more realistic target sentences in practice. In the meantime, IntroVNMT inherits the advantages of the variational autoencoders (VAEs), and the model training process is more stable than the generative adversarial network (GAN) based models. Experimental results on different translation tasks demonstrate that the proposed model can achieve significant improvements over the vanilla variational NMT model.

Download Full-text

Unsupervised classification of snowflake images using a generative adversarial network and <i>K</i>-medoids classification

Atmospheric Measurement Techniques ◽

10.5194/amt-13-2949-2020 ◽

2020 ◽

Vol 13 (6) ◽

pp. 2949-2964

Author(s):

Jussi Leinonen ◽

Alexis Berne

Keyword(s):

Classification Scheme ◽

Feature Space ◽

Unsupervised Classification ◽

Automated Classification ◽

Generative Adversarial Network ◽

Microphysical Properties ◽

Adversarial Network ◽

Computationally Intensive ◽

Comparison Of The Results ◽

Supervised Classification Methods

Abstract. The increasing availability of sensors imaging cloud and precipitation particles, like the Multi-Angle Snowflake Camera (MASC), has resulted in datasets comprising millions of images of falling snowflakes. Automated classification is required for effective analysis of such large datasets. While supervised classification methods have been developed for this purpose in recent years, their ability to generalize is limited by the representativeness of their labeled training datasets, which are affected by the subjective judgment of the expert and require significant manual effort to derive. An alternative is unsupervised classification, which seeks to divide a dataset into distinct classes without expert-provided labels. In this paper, we introduce an unsupervised classification scheme based on a generative adversarial network (GAN) that learns to extract the key features from the snowflake images. Each image is then associated with a distribution of points in the feature space, and these distributions are used as the basis of K-medoids classification and hierarchical clustering. We found that the classification scheme is able to separate the dataset into distinct classes, each characterized by a particular size, shape and texture of the snowflake image, providing signatures of the microphysical properties of the snowflakes. This finding is supported by a comparison of the results to an existing supervised scheme. Although training the GAN is computationally intensive, the classification process proceeds directly from images to classes with minimal human intervention and therefore can be repeated for other MASC datasets with minor manual effort. As the algorithm is not specific to snowflakes, we also expect this approach to be relevant to other applications.

Download Full-text

Spatio-temporal silhouette sequence reconstruction for gait recognition against occlusion

IPSJ Transactions on Computer Vision and Applications ◽

10.1186/s41074-019-0061-3 ◽

2019 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Md. Zasim Uddin ◽

Daigo Muramatsu ◽

Noriko Takemura ◽

Md. Atiqur Rahman Ahad ◽

Yasushi Yagi

Keyword(s):

Gait Cycle ◽

Gait Recognition ◽

Real Life ◽

Image Sequence ◽

Body Parts ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Hinge Loss ◽

Sequence Reconstruction ◽

Spatio Temporal

AbstractGait-based features provide the potential for a subject to be recognized even from a low-resolution image sequence, and they can be captured at a distance without the subject’s cooperation. Person recognition using gait-based features (gait recognition) is a promising real-life application. However, several body parts of the subjects are often occluded because of beams, pillars, cars and trees, or another walking person. Therefore, gait-based features are not applicable to approaches that require an unoccluded gait image sequence. Occlusion handling is a challenging but important issue for gait recognition. In this paper, we propose silhouette sequence reconstruction from an occluded sequence (sVideo) based on a conditional deep generative adversarial network (GAN). From the reconstructed sequence, we estimate the gait cycle and extract the gait features from a one gait cycle image sequence. To regularize the training of the proposed generative network, we use adversarial loss based on triplet hinge loss incorporating Wasserstein GAN (WGAN-hinge). To the best of our knowledge, WGAN-hinge is the first adversarial loss that supervises the generator network during training by incorporating pairwise similarity ranking information. The proposed approach was evaluated on multiple challenging occlusion patterns. The experimental results demonstrate that the proposed approach outperforms the existing state-of-the-art benchmarks.

Download Full-text

Data-driven modelling of nonlinear spatio-temporal fluid flows using a deep convolutional generative adversarial network

Computer Methods in Applied Mechanics and Engineering ◽

10.1016/j.cma.2020.113000 ◽

2020 ◽

Vol 365 ◽

pp. 113000 ◽

Cited By ~ 5

Author(s):

M. Cheng ◽

F. Fang ◽

C.C. Pain ◽

I.M. Navon

Keyword(s):

Fluid Flows ◽

Data Driven ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Spatio Temporal

Download Full-text

Transferable Adversarial Attacks for Image and Video Object Detection

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/134 ◽

2019 ◽

Cited By ~ 8

Author(s):

Xingxing Wei ◽

Siyuan Liang ◽

Ning Chen ◽

Xiaochun Cao

Keyword(s):

Object Detection ◽

Video Data ◽

Detection Methods ◽

Feature Maps ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Examples ◽

Adversarial Example ◽

High Level ◽

Image Object Detection

Identifying adversarial examples is beneficial for understanding deep networks and developing robust models. However, existing attacking methods for image object detection have two limitations: weak transferability---the generated adversarial examples often have a low success rate to attack other kinds of detection methods, and high computation cost---they need much time to deal with video data, where many frames need polluting. To address these issues, we present a generative method to obtain adversarial images and videos, thereby significantly reducing the processing time. To enhance transferability, we manipulate the feature maps extracted by a feature network, which usually constitutes the basis of object detectors. Our method is based on the Generative Adversarial Network (GAN) framework, where we combine a high-level class loss and a low-level feature loss to jointly train the adversarial example generator. Experimental results on PASCAL VOC and ImageNet VID datasets show that our method efficiently generates image and video adversarial examples, and more importantly, these adversarial examples have better transferability, therefore being able to simultaneously attack two kinds of representative object detection models: proposal based models like Faster-RCNN and regression based models like SSD.

Download Full-text

Cross-Individual Affective Detection Using EEG Signals with Audio-Visual Embedding

10.1101/2021.08.06.455362 ◽

2021 ◽

Author(s):

Zhen Liang ◽

Xihao Zhang ◽

Rushuang Zhou ◽

Li Zhang ◽

Linling Li ◽

...

Keyword(s):

Emotion Recognition ◽

Temporal Dynamics ◽

Recognition Performance ◽

Critical Issue ◽

Feature Representation ◽

Fusion Model ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Brain Data ◽

Eeg Features

How to effectively and efficiently extract valid and reliable features from high-dimensional electroencephalography (EEG), particularly how to fuse the spatial and temporal dynamic brain information into a better feature representation, is a critical issue in brain data analysis. Most current EEG studies work in a task driven manner and explore the valid EEG features with a supervised model, which would be limited by the given labels to a great extent. In this paper, we propose a practical hybrid unsupervised deep convolutional recurrent generative adversarial network based EEG feature characterization and fusion model, which is termed as EEGFuseNet. EEGFuseNet is trained in an unsupervised manner, and deep EEG features covering both spatial and temporal dynamics are automatically characterized. Comparing to the existing features, the characterized deep EEG features could be considered to be more generic and independent of any specific EEG task. The performance of the extracted deep and low-dimensional features by EEGFuseNet is carefully evaluated in an unsupervised emotion recognition application based on three public emotion databases. The results demonstrate the proposed EEGFuseNet is a robust and reliable model, which is easy to train and performs efficiently in the representation and fusion of dynamic EEG features. In particular, EEGFuseNet is established as an optimal unsupervised fusion model with promising cross-subject emotion recognition performance. It proves EEGFuseNet is capable of characterizing and fusing deep features that imply comparative cortical dynamic significance corresponding to the changing of different emotion states, and also demonstrates the possibility of realizing EEG based cross-subject emotion recognition in a pure unsupervised manner.

Download Full-text

Cross-Modality Person Re-Identification with Generative Adversarial Training

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/94 ◽

2018 ◽

Cited By ~ 31

Author(s):

Pingyang Dai ◽

Rongrong Ji ◽

Haibin Wang ◽

Qiong Wu ◽

Yuyu Huang

Keyword(s):

Large Scale ◽

Characteristic Curve ◽

Metric Learning ◽

Feature Representation ◽

Superior Performance ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Discriminative Feature ◽

Adversarial Training ◽

Rgb Images

Person re-identification (Re-ID) is an important task in video surveillance which automatically searches and identifies people across different cameras. Despite the extensive Re-ID progress in RGB cameras, few works have studied the Re-ID between infrared and RGB images, which is essentially a cross-modality problem and widely encountered in real-world scenarios. The key challenge lies in two folds, i.e., the lack of discriminative information to re-identify the same person between RGB and infrared modalities, and the difficulty to learn a robust metric towards such a large-scale cross-modality retrieval. In this paper, we tackle the above two challenges by proposing a novel cross-modality generative adversarial network (termed cmGAN). To handle the issue of insufficient discriminative information, we leverage the cutting-edge generative adversarial training to design our own discriminator to learn discriminative feature representation from different modalities. To handle the issue of large-scale cross-modality metric learning, we integrates both identification loss and cross-modality triplet loss, which minimize inter-class ambiguity while maximizing cross-modality similarity among instances. The entire cmGAN can be trained in an end-to-end manner by using standard deep neural network framework. We have quantized the performance of our work in the newly-released SYSU RGB-IR Re-ID benchmark, and have reported superior performance, i.e., Cumulative Match Characteristic curve (CMC) and Mean Average Precision (MAP), over the state-of-the-art works [Wu et al., 2017], respectively.

Download Full-text