scholarly journals Multi-scale Hierarchical Residual Network for Dense Captioning

2019 ◽  
Vol 64 ◽  
pp. 181-196 ◽  
Author(s):  
Yan Tian ◽  
Xun Wang ◽  
Jiachen Wu ◽  
Ruili Wang ◽  
Bailin Yang

Recent research on dense captioning based on the recurrent neural network and the convolutional neural network has made a great progress. However, mapping from an image feature space to a description space is a nonlinear and multimodel task, which makes it difficult for the current methods to get accurate results. In this paper, we put forward a novel approach for dense captioning based on hourglass-structured residual learning. Discriminant feature maps are obtained by incorporating dense connected networks and residual learning in our model. Finally, the performance of the approach on the Visual Genome V1.0 dataset and the region labelled MS-COCO (Microsoft Common Objects in Context) dataset are demonstrated. The experimental results have shown that our approach outperforms most current methods.

Author(s):  
Dov Danon ◽  
Moab Arar ◽  
Daniel Cohen-Or ◽  
Ariel Shamir

AbstractTraditional image resizing methods usually work in pixel space and use various saliency measures. The challenge is to adjust the image shape while trying to preserve important content. In this paper we perform image resizing in feature space using the deep layers of a neural network containing rich important semantic information. We directly adjust the image feature maps, extracted from a pre-trained classification network, and reconstruct the resized image using neural-network based optimization. This novel approach leverages the hierarchical encoding of the network, and in particular, the high-level discriminative power of its deeper layers, that can recognize semantic regions and objects, thereby allowing maintenance of their aspect ratios. Our use of reconstruction from deep features results in less noticeable artifacts than use of imagespace resizing operators. We evaluate our method on benchmarks, compare it to alternative approaches, and demonstrate its strengths on challenging images.


2021 ◽  
Vol 11 (10) ◽  
pp. 2618-2625
Author(s):  
R. T. Subhalakshmi ◽  
S. Appavu Alias Balamurugan ◽  
S. Sasikala

In recent times, the COVID-19 epidemic turn out to be increased in an extreme manner, by the accessibility of an inadequate amount of rapid testing kits. Consequently, it is essential to develop the automated techniques for Covid-19 detection to recognize the existence of disease from the radiological images. The most ordinary symptoms of COVID-19 are sore throat, fever, and dry cough. Symptoms are able to progress to a rigorous type of pneumonia with serious impediment. As medical imaging is not recommended currently in Canada for crucial COVID-19 diagnosis, systems of computer-aided diagnosis might aid in early COVID-19 abnormalities detection and help out to observe the disease progression, reduce mortality rates potentially. In this approach, a deep learning based design for feature extraction and classification is employed for automatic COVID-19 diagnosis from computed tomography (CT) images. The proposed model operates on three main processes based pre-processing, feature extraction, and classification. The proposed design incorporates the fusion of deep features using GoogLe Net models. Finally, Multi-scale Recurrent Neural network (RNN) based classifier is applied for identifying and classifying the test CT images into distinct class labels. The experimental validation of the proposed model takes place using open-source COVID-CT dataset, which comprises a total of 760 CT images. The experimental outcome defined the superior performance with the maximum sensitivity, specificity, and accuracy.


2019 ◽  
Vol 11 (14) ◽  
pp. 1678 ◽  
Author(s):  
Yongyong Fu ◽  
Ziran Ye ◽  
Jinsong Deng ◽  
Xinyu Zheng ◽  
Yibo Huang ◽  
...  

Marine aquaculture plays an important role in seafood supplement, economic development, and coastal ecosystem service provision. The precise delineation of marine aquaculture areas from high spatial resolution (HSR) imagery is vital for the sustainable development and management of coastal marine resources. However, various sizes and detailed structures of marine objects make it difficult for accurate mapping from HSR images by using conventional methods. Therefore, this study attempts to extract marine aquaculture areas by using an automatic labeling method based on the convolutional neural network (CNN), i.e., an end-to-end hierarchical cascade network (HCNet). Specifically, for marine objects of various sizes, we propose to improve the classification performance by utilizing multi-scale contextual information. Technically, based on the output of a CNN encoder, we employ atrous convolutions to capture multi-scale contextual information and aggregate them in a hierarchical cascade way. Meanwhile, for marine objects with detailed structures, we propose to refine the detailed information gradually by using a series of long-span connections with fine resolution features from the shallow layers. In addition, to decrease the semantic gaps between features in different levels, we propose to refine the feature space (i.e., channel and spatial dimensions) using an attention-based module. Experimental results show that our proposed HCNet can effectively identify and distinguish different kinds of marine aquaculture, with 98% of overall accuracy. It also achieves better classification performance compared with object-based support vector machine and state-of-the-art CNN-based methods, such as FCN-32s, U-Net, and DeeplabV2. Our developed method lays a solid foundation for the intelligent monitoring and management of coastal marine resources.


Author(s):  
Yunsheng Bai ◽  
Hao Ding ◽  
Yang Qiao ◽  
Agustin Marinovic ◽  
Ken Gu ◽  
...  

We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGraphEmb, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGraphEmb achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.


2021 ◽  
Vol 13 (23) ◽  
pp. 4743
Author(s):  
Wei Yuan ◽  
Wenbo Xu

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.


2021 ◽  
Vol 1 (1) ◽  
pp. 29-31
Author(s):  
Mahmood Haithami ◽  
Amr Ahmed ◽  
Iman Yi Liao ◽  
Hamid Jalab

In this paper, we aim to enhance the segmentation capabilities of DeeplabV3 by employing Gated Recurrent Neural Network (GRU). A 1-by-1 convolution in DeeplabV3 was replaced by GRU after the Atrous Spatial Pyramid Pooling (ASSP) layer to combine the input feature maps. The convolution and GRU have sharable parameters, though, the latter has gates that enable/disable the contribution of each input feature map. The experiments on unseen test sets demonstrate that employing GRU instead of convolution would produce better segmentation results. The used datasets are public datasets provided by MedAI competition.


2020 ◽  
Vol 12 (5) ◽  
pp. 789 ◽  
Author(s):  
Kun Li ◽  
Xiangyun Hu ◽  
Huiwei Jiang ◽  
Zhen Shu ◽  
Mi Zhang

Automatic extraction of region objects from high-resolution satellite imagery presents a great challenge, because there may be very large variations of the objects in terms of their size, texture, shape, and contextual complexity in the image. To handle these issues, we present a novel, deep-learning-based approach to interactively extract non-artificial region objects, such as water bodies, woodland, farmland, etc., from high-resolution satellite imagery. First, our algorithm transforms user-provided positive and negative clicks or scribbles into guidance maps, which consist of a relevance map modified from Euclidean distance maps, two geodesic distance maps (for positive and negative, respectively), and a sampling map. Then, feature maps are extracted by applying a VGG convolutional neural network pre-trained on the ImageNet dataset to the image X, and they are then upsampled to the resolution of X. Image X, guidance maps, and feature maps are integrated as the input tensor. We feed the proposed attention-guided, multi-scale segmentation neural network (AGMSSeg-Net) with the input tensor above to obtain the mask that assigns a binary label to each pixel. After a post-processing operation based on a fully connected Conditional Random Field (CRF), we extract the selected object boundary from the segmentation result. Experiments were conducted on two typical datasets with diverse region object types from complex scenes. The results demonstrate the effectiveness of the proposed method, and our approach outperforms existing methods for interactive image segmentation.


Author(s):  
K. Chen ◽  
M. Weinmann ◽  
X. Sun ◽  
M. Yan ◽  
S. Hinz ◽  
...  

<p><strong>Abstract.</strong> In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.</p>


Sign in / Sign up

Export Citation Format

Share Document