scholarly journals Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective

2019 ◽  
Vol 8 (9) ◽  
pp. 417 ◽  
Author(s):  
Wei Cui ◽  
Dongyou Zhang ◽  
Xin He ◽  
Meng Yao ◽  
Ziwei Wang ◽  
...  

Remote sensing image captioning involves remote sensing objects and their spatial relationships. However, it is still difficult to determine the spatial extent of a remote sensing object and the size of a sample patch. If the patch size is too large, it will include too many remote sensing objects and their complex spatial relationships. This will increase the computational burden of the image captioning network and reduce its precision. If the patch size is too small, it often fails to provide enough environmental and contextual information, which makes the remote sensing object difficult to describe. To address this problem, we propose a multi-scale semantic long short-term memory network (MS-LSTM). The remote sensing images are paired into image patches with different spatial scales. First, the large-scale patches have larger sizes. We use a Visual Geometry Group (VGG) network to extract the features from the large-scale patches and input them into the improved MS-LSTM network as the semantic information, which provides a larger receptive field and more contextual semantic information for small-scale image caption so as to play the role of global perspective, thereby enabling the accurate identification of small-scale samples with the same features. Second, a small-scale patch is used to highlight remote sensing objects and simplify their spatial relations. In addition, the multi-receptive field provides perspectives from local to global. The experimental results demonstrated that compared with the original long short-term memory network (LSTM), the MS-LSTM’s Bilingual Evaluation Understudy (BLEU) has been increased by 5.6% to 0.859, thereby reflecting that the MS-LSTM has a more comprehensive receptive field, which provides more abundant semantic information and enhances the remote sensing image captions.

2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Haifeng Sang ◽  
Chuanzheng Wang ◽  
Dakuo He ◽  
Qing Liu

This paper presents a multi-information flow convolutional neural network (MiF-CNN) model for person reidentification (re-id). It contains several specific multilayer convolutional structures, where the input and output of a convolutional layer are concatenated together on channel dimension. With this idea, layers of model can go deeper and feature maps can be reused by each subsequent layer. Inspired by an image caption, a person attribute recognition network is proposed based on long-short-term memory network and attention mechanism. By fusing identification results of MiF-CNN and attribute recognition, this paper introduces the attribute-aided reranking algorithm to improve the accuracy of person re-id further. Experiments on VIPeR, CUHK01, and Market1501 datasets verify the proposed MiF-CNN can be trained sufficiently with small-scale datasets and obtain outstanding accuracy of person re-id. Contrast experiments also confirm the availability of the attribute-assisted reranking algorithm.


Energies ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 3433 ◽  
Author(s):  
Seon Kim ◽  
Gyul Lee ◽  
Gu-Young Kwon ◽  
Do-In Kim ◽  
Yong-June Shin

Load forecasting is a key issue for efficient real-time energy management in smart grids. To control the load using demand side management accurately, load forecasting should be predicted in the short term. With the advent of advanced measuring infrastructure, it is possible to measure energy consumption at sampling rates up to every 5 min and analyze the load profile of small-scale energy groups, such as individual buildings. This paper presents applications of deep learning using feature decomposition for improving the accuracy of load forecasting. The load profile is decomposed into a weekly load profile and then decomposed into intrinsic mode functions by variational mode decomposition to capture periodic features. Then, a long short-term memory network model is trained by three-dimensional input data with three-step regularization. Finally, the prediction results of all intrinsic mode functions are combined with advanced measuring infrastructure measured in the previous steps to determine an aggregated output for load forecasting. The results are validated by applications to real-world data from smart buildings, and the performance of the proposed approach is assessed by comparing the predicted results with those of conventional methods, nonlinear autoregressive networks with exogenous inputs, and long short-term memory network-based feature decomposition.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yesol Park ◽  
Joohong Lee ◽  
Heesang Moon ◽  
Yong Suk Choi ◽  
Mina Rho

AbstractWith recent advances in biotechnology and sequencing technology, the microbial community has been intensively studied and discovered to be associated with many chronic as well as acute diseases. Even though a tremendous number of studies describing the association between microbes and diseases have been published, text mining methods that focus on such associations have been rarely studied. We propose a framework that combines machine learning and natural language processing methods to analyze the association between microbes and diseases. A hierarchical long short-term memory network was used to detect sentences that describe the association. For the sentences determined, two different parse tree-based search methods were combined to find the relation-describing word. The ensemble model of constituency parsing for structural pattern matching and dependency-based relation extraction improved the prediction accuracy. By combining deep learning and parse tree-based extractions, our proposed framework could extract the microbe-disease association with higher accuracy. The evaluation results showed that our system achieved an F-score of 0.8764 and 0.8524 in binary decisions and extracting relation words, respectively. As a case study, we performed a large-scale analysis of the association between microbes and diseases. Additionally, a set of common microbes shared by multiple diseases were also identified in this study. This study could provide valuable information for the major microbes that were studied for a specific disease. The code and data are available at https://github.com/DMnBI/mdi_predictor.


2020 ◽  
Vol 12 (3) ◽  
pp. 405 ◽  
Author(s):  
Taghreed Abdullah ◽  
Yakoub Bazi ◽  
Mohamad M. Al Rahhal ◽  
Mohamed L. Mekhalfi ◽  
Lalitha Rangarajan ◽  
...  

Exploring the relevance between images and their respective natural language descriptions, due to its paramount importance, is regarded as the next frontier in the general computer vision literature. Thus, recently several works have attempted to map visual attributes onto their corresponding textual tenor with certain success. However, this line of research has not been widespread in the remote sensing community. On this point, our contribution is three-pronged. First, we construct a new dataset for text-image matching tasks, termed TextRS, by collecting images from four well-known different scene datasets, namely AID, Merced, PatternNet, and NWPU datasets. Each image is annotated by five different sentences. All the five sentences were allocated by five people to evidence the diversity. Second, we put forth a novel Deep Bidirectional Triplet Network (DBTN) for text to image matching. Unlike traditional remote sensing image-to-image retrieval, our paradigm seeks to carry out the retrieval by matching text to image representations. To achieve that, we propose to learn a bidirectional triplet network, which is composed of Long Short Term Memory network (LSTM) and pre-trained Convolutional Neural Networks (CNNs) based on (EfficientNet-B2, ResNet-50, Inception-v3, and VGG16). Third, we top the proposed architecture with an average fusion strategy to fuse the features pertaining to the five image sentences, which enables learning of more robust embedding. The performances of the method expressed in terms Recall@K representing the presence of the relevant image among the top K retrieved images to the query text shows promising results as it yields 17.20%, 51.39%, and 73.02% for K = 1, 5, and 10, respectively.


2021 ◽  
Vol 263 (2) ◽  
pp. 4355-4360
Author(s):  
Mitsunori Mizumachi ◽  
Ryotarou Oka

Acoustic beamforming with a microphone array enables spatial filtering in a wide frequency range. It is a challenging issue to sharpen the main-lobe in the lower frequency region with a small-scale microphone array, of which the number and spacing of microphones are small. A neural network-based non-linear beamformer achieves a breakthrough in sharpening the main-lobe. The non-linear beamforming works well for the narrowband signals but is weak in wideband beamforming. The non-linear beamforming with the long short-term memory is proposed to deal with wideband speech signals. The long short-term memory network is trained in the recurrent neural network architecture with the sequence of audio data such as speech signals. The performance of the proposed beamformer is confirmed using a small-scale 8-ch MEMS microphone array, where eight microphones are linearly arranged with the neighboring spacing of 10 mm, under a real environment. The beam-pattern of the proposed non-linear beamformer succeeds in sharpening the main-lobe although the linear delay-and-sum beamformer could not achieve frequency selectivity. The feasibility of the proposed beamformer is also confirmed in speech enhancement.


Sign in / Sign up

Export Citation Format

Share Document