scholarly journals Learning a Multi-Branch Neural Network from Multiple Sources for Knowledge Adaptation in Remote Sensing Imagery

2018 ◽  
Vol 10 (12) ◽  
pp. 1890 ◽  
Author(s):  
Mohamad Al Rahhal ◽  
Yakoub Bazi ◽  
Taghreed Abdullah ◽  
Mohamed Mekhalfi ◽  
Haikel AlHichri ◽  
...  

In this paper we propose a multi-branch neural network, called MB-Net, for solving the problem of knowledge adaptation from multiple remote sensing scene datasets acquired with different sensors over diverse locations and manually labeled with different experts. Our aim is to learn invariant feature representations from multiple source domains with labeled images and one target domain with unlabeled images. To this end, we define for MB-Net an objective function that mitigates the multiple domain shifts at both feature representation and decision levels, while retaining the ability to discriminate between different land-cover classes. The complete architecture is trainable end-to-end via the backpropagation algorithm. In the experiments, we demonstrate the effectiveness of the proposed method on a new multiple domain dataset created from four heterogonous scene datasets well known to the remote sensing community, namely, the University of California (UC-Merced) dataset, the Aerial Image dataset (AID), the PatternNet dataset, and the Northwestern Polytechnical University (NWPU) dataset. In particular, this method boosts the average accuracy over all transfer scenarios up to 89.05% compared to standard architecture based only on cross-entropy loss, which yields an average accuracy of 78.53%.

Author(s):  
Guanbin Li ◽  
Xin Zhu ◽  
Yirui Zeng ◽  
Qing Wang ◽  
Liang Lin

Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 2030 ◽  
Author(s):  
Byeongkeun Kang ◽  
Yeejin Lee

Driving is a task that puts heavy demands on visual information, thereby the human visual system plays a critical role in making proper decisions for safe driving. Understanding a driver’s visual attention and relevant behavior information is a challenging but essential task in advanced driver-assistance systems (ADAS) and efficient autonomous vehicles (AV). Specifically, robust prediction of a driver’s attention from images could be a crucial key to assist intelligent vehicle systems where a self-driving car is required to move safely interacting with the surrounding environment. Thus, in this paper, we investigate a human driver’s visual behavior in terms of computer vision to estimate the driver’s attention locations in images. First, we show that feature representations at high resolution improves visual attention prediction accuracy and localization performance when being fused with features at low-resolution. To demonstrate this, we employ a deep convolutional neural network framework that learns and extracts feature representations at multiple resolutions. In particular, the network maintains the feature representation with the highest resolution at the original image resolution. Second, attention prediction tends to be biased toward centers of images when neural networks are trained using typical visual attention datasets. To avoid overfitting to the center-biased solution, the network is trained using diverse regions of images. Finally, the experimental results verify that our proposed framework improves the prediction accuracy of a driver’s attention locations.


Author(s):  
Hatem Keshk ◽  
Xu-Cheng Yin

Background: Deep Learning (DL) neural network methods have become a hotspot subject of research in the remote sensing field. Classification of aerial satellite images depends on spectral content, which is a challenging topic in remote sensing. Objective: With the aim to accomplish a high performance and accuracy of Egyptsat-1 satellite image classification, the use of the Convolutional Neural Network (CNN) is raised in this paper because CNN is considered a leading deep learning method. CNN is developed to classify aerial photographs into land cover classes such as urban, vegetation, desert, water bodies, soil, roads, etc. In our work, a comparison between MAXIMUM Likelihood (ML) which represents the traditional supervised classification methods and CNN method is conducted. Conclusion: This research finds that CNN outperforms ML by 9%. The convolutional neural network has better classification result, which reached 92.25% as its average accuracy. Also, the experiments showed that the convolutional neural network is the most satisfactory and effective classification method applied to classify Egyptsat-1 satellite images.


Author(s):  
M. Chen ◽  
X. Wang ◽  
A. Dou ◽  
X. Wu

The seismic damage information of buildings extracted from remote sensing (RS) imagery is meaningful for supporting relief and effective reduction of losses caused by earthquake. Both traditional pixel-based and object-oriented methods have some shortcoming in extracting information of object. Pixel-based method can’t make fully use of contextual information of objects. Object-oriented method faces problem that segmentation of image is not ideal, and the choice of feature space is difficult. In this paper, a new stratage is proposed which combines Convolution Neural Network (CNN) with imagery segmentation to extract building damage information from remote sensing imagery. the key idea of this method includes two steps. First to use CNN to predicate the probability of each pixel and then integrate the probability within each segmentation spot. The method is tested through extracting the collapsed building and uncollapsed building from the aerial image which is acquired in Longtoushan Town after Ms 6.5 Ludian County, Yunnan Province earthquake. The results show that the proposed method indicates its effectiveness in extracting damage information of buildings after earthquake.


2019 ◽  
Vol 11 (12) ◽  
pp. 168781401989721 ◽  
Author(s):  
Changchang Che ◽  
Huawei Wang ◽  
Qiang Fu ◽  
Xiaomei Ni

Rolling bearings are the vital components of rotary machines. The collected data of rolling bearing have strong noise interference, massive unlabeled samples, and different fault features. Thus, a deep transfer learning method is proposed for rolling bearings fault diagnosis under variable operating conditions. To obtain robust feature representation, the denoising autoencoder is used to denoise and reduce dimension of unlabeled rolling bearing signals. For those unlabeled target domain signals, a feature matching method based on multi-kernel maximum mean discrepancies between source domain and target domain is adopted to get enough labeled target domain samples. Then, these rolling bearing signals are converted to multi-dimensional graph samples and fed into a convolutional neural network model for fault diagnosis. To improve the generalization of convolutional neural network under variable operating conditions, we combine model-based transfer learning with feature-based transfer learning to initialize and optimize the convolutional neural network parameters. The effectiveness of the proposed method is validated through several comparative experiments of Case Western Reserve University data. The results demonstrate that the proposed method can learn features adaptively from noisy data and increase the accuracy rate by 2%–8% comparing with other models.


Sensors ◽  
2019 ◽  
Vol 19 (17) ◽  
pp. 3703 ◽  
Author(s):  
Yang Tao ◽  
Chunyan Li ◽  
Zhifang Liang ◽  
Haocheng Yang ◽  
Juan Xu

Electronic nose (E-nose), a kind of instrument which combines with the gas sensor and the corresponding pattern recognition algorithm, is used to detect the type and concentration of gases. However, the sensor drift will occur in realistic application scenario of E-nose, which makes a variation of data distribution in feature space and causes a decrease in prediction accuracy. Therefore, studies on the drift compensation algorithms are receiving increasing attention in the field of the E-nose. In this paper, a novel method, namely Wasserstein Distance Learned Feature Representations (WDLFR), is put forward for drift compensation, which is based on the domain invariant feature representation learning. It regards a neural network as a domain discriminator to measure the empirical Wasserstein distance between the source domain (data without drift) and target domain (drift data). The WDLFR minimizes Wasserstein distance by optimizing the feature extractor in an adversarial manner. The Wasserstein distance for domain adaption has good gradient and generalization bound. Finally, the experiments are conducted on a real dataset of E-nose from the University of California, San Diego (UCSD). The experimental results demonstrate that the effectiveness of the proposed method outperforms all compared drift compensation methods, and the WDLFR succeeds in significantly reducing the sensor drift.


Author(s):  
Feiwu Yu ◽  
Xinxiao Wu ◽  
Yuchao Sun ◽  
Lixin Duan

Existing deep learning methods of video recognition usually require a large number of labeled videos for training. But for a new task, videos are often unlabeled and it is also time-consuming and labor-intensive to annotate them. Instead of human annotation, we try to make use of existing fully labeled images to help recognize those videos. However, due to the problem of domain shifts and heterogeneous feature representations, the performance of classifiers trained on images may be dramatically degraded for video recognition tasks. In this paper, we propose a novel method, called Hierarchical Generative Adversarial Networks (HiGAN), to enhance recognition in videos (i.e., target domain) by transferring knowledge from images (i.e., source domain). The HiGAN model consists of a \emph{low-level} conditional GAN and a \emph{high-level} conditional GAN. By taking advantage of these two-level adversarial learning, our method is capable of learning a domain-invariant feature representation of source images and target videos. Comprehensive experiments on two challenging video recognition datasets (i.e. UCF101 and HMDB51) demonstrate the effectiveness of the proposed method when compared with the existing state-of-the-art domain adaptation methods.


2021 ◽  
Vol 13 (24) ◽  
pp. 5015
Author(s):  
Libo Wang ◽  
Ce Zhang ◽  
Rui Li ◽  
Chenxi Duan ◽  
Xiaoliang Meng ◽  
...  

Assigning geospatial objects with specific categories at the pixel level is a fundamental task in remote sensing image analysis. Along with the rapid development of sensor technologies, remotely sensed images can be captured at multiple spatial resolutions (MSR) with information content manifested at different scales. Extracting information from these MSR images represents huge opportunities for enhanced feature representation and characterisation. However, MSR images suffer from two critical issues: (1) increased scale variation of geo-objects and (2) loss of detailed information at coarse spatial resolutions. To bridge these gaps, in this paper, we propose a novel scale-aware neural network (SaNet) for the semantic segmentation of MSR remotely sensed imagery. SaNet deploys a densely connected feature network (DCFFM) module to capture high-quality multi-scale context, such that the scale variation is handled properly and the quality of segmentation is increased for both large and small objects. A spatial feature recalibration (SFRM) module was further incorporated into the network to learn intact semantic content with enhanced spatial relationships, where the negative effects of information loss are removed. The combination of DCFFM and SFRM allows SaNet to learn scale-aware feature representation, which outperforms the existing multi-scale feature representation. Extensive experiments on three semantic segmentation datasets demonstrated the effectiveness of the proposed SaNet in cross-resolution segmentation.


2021 ◽  
Vol 14 (1-2) ◽  
pp. 38-46
Author(s):  
Balázs Jakab ◽  
Boudewijn van Leeuwen ◽  
Zalán Tobak

Abstract Agricultural production in greenhouses shows a rapid growth in many parts of the world. This form of intensive farming requires a large amount of water and fertilizers, and can have a severe impact on the environment. The number of greenhouses and their location is important for applications like spatial planning, environmental protection, agricultural statistics and taxation. Therefore, with this study we aim to develop a methodology to detect plastic greenhouses in remote sensing data using machine learning algorithms. This research presents the results of the use of a convolutional neural network for automatic object detection of plastic greenhouses in high resolution remotely sensed data within a GIS environment with a graphical interface to advanced algorithms. The convolutional neural network is trained with manually digitized greenhouses and RGB images downloaded from Google Earth. The ArcGIS Pro geographic information system provides access to many of the most advanced python-based machine learning environments like Keras – TensorFlow, PyTorch, fastai and Scikit-learn. These libraries can be accessed via a graphical interface within the GIS environment. Our research evaluated the results of training and inference of three different convolutional neural networks. Experiments were executed with many settings for the backbone models and hyperparameters. The performance of the three models in terms of detection accuracy and time required for training was compared. The model based on the VGG_11 backbone model (with dropout) resulted in an average accuracy of 79.2% with a relatively short training time of 90 minutes, the much more complex DenseNet121 model was trained in 16.5 hours and showed a result of 79.1%, while the ResNet18 based model showed an average accuracy of 83.1% with a training time of 3.5 hours.


2020 ◽  
Vol 38 (4A) ◽  
pp. 510-514
Author(s):  
Tay H. Shihab ◽  
Amjed N. Al-Hameedawi ◽  
Ammar M. Hamza

In this paper to make use of complementary potential in the mapping of LULC spatial data is acquired from LandSat 8 OLI sensor images are taken in 2019.  They have been rectified, enhanced and then classified according to Random forest (RF) and artificial neural network (ANN) methods. Optical remote sensing images have been used to get information on the status of LULC classification, and extraction details. The classification of both satellite image types is used to extract features and to analyse LULC of the study area. The results of the classification showed that the artificial neural network method outperforms the random forest method. The required image processing has been made for Optical Remote Sensing Data to be used in LULC mapping, include the geometric correction, Image Enhancements, The overall accuracy when using the ANN methods 0.91 and the kappa accuracy was found 0.89 for the training data set. While the overall accuracy and the kappa accuracy of the test dataset were found 0.89 and 0.87 respectively.


Sign in / Sign up

Export Citation Format

Share Document