Multi-scale Hierarchical Residual Network for Dense Captioning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11338 ◽

2019 ◽

Vol 64 ◽

pp. 181-196 ◽

Cited By ~ 4

Author(s):

Yan Tian ◽

Xun Wang ◽

Jiachen Wu ◽

Ruili Wang ◽

Bailin Yang

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Feature Space ◽

Image Feature ◽

Feature Maps ◽

Residual Network ◽

Multi Scale ◽

Residual Learning ◽

Novel Approach ◽

Great Progress

Recent research on dense captioning based on the recurrent neural network and the convolutional neural network has made a great progress. However, mapping from an image feature space to a description space is a nonlinear and multimodel task, which makes it difficult for the current methods to get accurate results. In this paper, we put forward a novel approach for dense captioning based on hourglass-structured residual learning. Discriminant feature maps are obtained by incorporating dense connected networks and residual learning in our model. Finally, the performance of the approach on the Visual Genome V1.0 dataset and the region labelled MS-COCO (Microsoft Common Objects in Context) dataset are demonstrated. The experimental results have shown that our approach outperforms most current methods.

Download Full-text

Image resizing by reconstruction from deep features

Computational Visual Media ◽

10.1007/s41095-021-0216-x ◽

2021 ◽

Author(s):

Dov Danon ◽

Moab Arar ◽

Daniel Cohen-Or ◽

Ariel Shamir

Keyword(s):

Neural Network ◽

Feature Space ◽

Image Feature ◽

Feature Maps ◽

Image Resizing ◽

Aspect Ratios ◽

Novel Approach ◽

Deep Layers ◽

Image Shape ◽

High Level

AbstractTraditional image resizing methods usually work in pixel space and use various saliency measures. The challenge is to adjust the image shape while trying to preserve important content. In this paper we perform image resizing in feature space using the deep layers of a neural network containing rich important semantic information. We directly adjust the image feature maps, extracted from a pre-trained classification network, and reconstruct the resized image using neural-network based optimization. This novel approach leverages the hierarchical encoding of the network, and in particular, the high-level discriminative power of its deeper layers, that can recognize semantic regions and objects, thereby allowing maintenance of their aspect ratios. Our use of reconstruction from deep features results in less noticeable artifacts than use of imagespace resizing operators. We evaluate our method on benchmarks, compare it to alternative approaches, and demonstrate its strengths on challenging images.

Download Full-text

Automatic Segmentation and Classification of COVID-19 CT Image Using Deep Learning and Multi-Scale Recurrent Neural Network Based Classifier

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3850 ◽

2021 ◽

Vol 11 (10) ◽

pp. 2618-2625

Author(s):

R. T. Subhalakshmi ◽

S. Appavu Alias Balamurugan ◽

S. Sasikala

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Deep Learning ◽

Recurrent Neural Network ◽

Automatic Segmentation ◽

Ct Images ◽

Superior Performance ◽

Multi Scale ◽

Proposed Model ◽

Class Labels

In recent times, the COVID-19 epidemic turn out to be increased in an extreme manner, by the accessibility of an inadequate amount of rapid testing kits. Consequently, it is essential to develop the automated techniques for Covid-19 detection to recognize the existence of disease from the radiological images. The most ordinary symptoms of COVID-19 are sore throat, fever, and dry cough. Symptoms are able to progress to a rigorous type of pneumonia with serious impediment. As medical imaging is not recommended currently in Canada for crucial COVID-19 diagnosis, systems of computer-aided diagnosis might aid in early COVID-19 abnormalities detection and help out to observe the disease progression, reduce mortality rates potentially. In this approach, a deep learning based design for feature extraction and classification is employed for automatic COVID-19 diagnosis from computed tomography (CT) images. The proposed model operates on three main processes based pre-processing, feature extraction, and classification. The proposed design incorporates the fusion of deep features using GoogLe Net models. Finally, Multi-scale Recurrent Neural network (RNN) based classifier is applied for identifying and classifying the test CT images into distinct class labels. The experimental validation of the proposed model takes place using open-source COVID-CT dataset, which comprises a total of 760 CT images. The experimental outcome defined the superior performance with the maximum sensitivity, specificity, and accuracy.

Download Full-text

A Multi-Scale Neural Network for Traffic Sign Detection Based on Pyramid Feature Maps

2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc/smartcity/dss.2019.00255 ◽

2019 ◽

Author(s):

Jia Liu ◽

Chongyang Zhang

Keyword(s):

Neural Network ◽

Feature Maps ◽

Traffic Sign ◽

Multi Scale ◽

Sign Detection ◽

Traffic Sign Detection

Download Full-text

Finer Resolution Mapping of Marine Aquaculture Areas Using WorldView-2 Imagery and a Hierarchical Cascade Convolutional Neural Network

Remote Sensing ◽

10.3390/rs11141678 ◽

2019 ◽

Vol 11 (14) ◽

pp. 1678 ◽

Cited By ~ 3

Author(s):

Yongyong Fu ◽

Ziran Ye ◽

Jinsong Deng ◽

Xinyu Zheng ◽

Yibo Huang ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Contextual Information ◽

Feature Space ◽

Classification Performance ◽

Support Vector ◽

Marine Resources ◽

Marine Aquaculture ◽

Multi Scale ◽

Coastal Marine

Marine aquaculture plays an important role in seafood supplement, economic development, and coastal ecosystem service provision. The precise delineation of marine aquaculture areas from high spatial resolution (HSR) imagery is vital for the sustainable development and management of coastal marine resources. However, various sizes and detailed structures of marine objects make it difficult for accurate mapping from HSR images by using conventional methods. Therefore, this study attempts to extract marine aquaculture areas by using an automatic labeling method based on the convolutional neural network (CNN), i.e., an end-to-end hierarchical cascade network (HCNet). Specifically, for marine objects of various sizes, we propose to improve the classification performance by utilizing multi-scale contextual information. Technically, based on the output of a CNN encoder, we employ atrous convolutions to capture multi-scale contextual information and aggregate them in a hierarchical cascade way. Meanwhile, for marine objects with detailed structures, we propose to refine the detailed information gradually by using a series of long-span connections with fine resolution features from the shallow layers. In addition, to decrease the semantic gaps between features in different levels, we propose to refine the feature space (i.e., channel and spatial dimensions) using an attention-based module. Experimental results show that our proposed HCNet can effectively identify and distinguish different kinds of marine aquaculture, with 98% of overall accuracy. It also achieves better classification performance compared with object-based support vector machine and state-of-the-art CNN-based methods, such as FCN-32s, U-Net, and DeeplabV2. Our developed method lays a solid foundation for the intelligent monitoring and management of coastal marine resources.

Download Full-text

Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/275 ◽

2019 ◽

Cited By ~ 5

Author(s):

Yunsheng Bai ◽

Hao Ding ◽

Yang Qiao ◽

Agustin Marinovic ◽

Ken Gu ◽

...

Keyword(s):

Neural Network ◽

Vector Space ◽

General Framework ◽

Representation Learning ◽

Generation Mechanism ◽

Graph Visualization ◽

Graph Classification ◽

Training Set ◽

Multi Scale ◽

Novel Approach

We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGraphEmb, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGraphEmb achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.

Download Full-text

MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer

Remote Sensing ◽

10.3390/rs13234743 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4743

Author(s):

Wei Yuan ◽

Wenbo Xu

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Network Model ◽

Remote Sensing Images ◽

Feature Maps ◽

Global Features ◽

Adaptive Network ◽

Data Set ◽

Multi Scale

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.

Download Full-text

Employing GRU to combine feature maps in DeeplabV3 for a better segmentation model

Nordic Machine Intelligence ◽

10.5617/nmi.9131 ◽

2021 ◽

Vol 1 (1) ◽

pp. 29-31

Author(s):

Mahmood Haithami ◽

Amr Ahmed ◽

Iman Yi Liao ◽

Hamid Jalab

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Feature Maps ◽

Feature Map ◽

Input Feature ◽

Spatial Pyramid Pooling ◽

Test Sets ◽

Public Datasets ◽

Spatial Pyramid

In this paper, we aim to enhance the segmentation capabilities of DeeplabV3 by employing Gated Recurrent Neural Network (GRU). A 1-by-1 convolution in DeeplabV3 was replaced by GRU after the Atrous Spatial Pyramid Pooling (ASSP) layer to combine the input feature maps. The convolution and GRU have sharable parameters, though, the latter has gates that enable/disable the contribution of each input feature map. The experiments on unseen test sets demonstrate that employing GRU instead of convolution would produce better segmentation results. The used datasets are public datasets provided by MedAI competition.

Download Full-text

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection

2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) ◽

10.1109/aciiw.2019.8925176 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yingmei Guo ◽

Mingxing Xu ◽

Zhiyong Wu ◽

Jianming Wu ◽

Bin Su

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Event Detection ◽

Ensemble Method ◽

Multi Scale ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Attention-Guided Multi-Scale Segmentation Neural Network for Interactive Extraction of Region Objects from High-Resolution Satellite Imagery

Remote Sensing ◽

10.3390/rs12050789 ◽

2020 ◽

Vol 12 (5) ◽

pp. 789 ◽

Cited By ~ 1

Author(s):

Kun Li ◽

Xiangyun Hu ◽

Huiwei Jiang ◽

Zhen Shu ◽

Mi Zhang

Keyword(s):

Neural Network ◽

High Resolution ◽

Satellite Imagery ◽

Conditional Random Field ◽

Geodesic Distance ◽

Feature Maps ◽

Object Boundary ◽

Multi Scale ◽

High Resolution Satellite Imagery ◽

Fully Connected

Automatic extraction of region objects from high-resolution satellite imagery presents a great challenge, because there may be very large variations of the objects in terms of their size, texture, shape, and contextual complexity in the image. To handle these issues, we present a novel, deep-learning-based approach to interactively extract non-artificial region objects, such as water bodies, woodland, farmland, etc., from high-resolution satellite imagery. First, our algorithm transforms user-provided positive and negative clicks or scribbles into guidance maps, which consist of a relevance map modified from Euclidean distance maps, two geodesic distance maps (for positive and negative, respectively), and a sampling map. Then, feature maps are extracted by applying a VGG convolutional neural network pre-trained on the ImageNet dataset to the image X, and they are then upsampled to the resolution of X. Image X, guidance maps, and feature maps are integrated as the input tensor. We feed the proposed attention-guided, multi-scale segmentation neural network (AGMSSeg-Net) with the input tensor above to obtain the mask that assigns a binary label to each pixel. After a post-processing operation based on a fully connected Conditional Random Field (CRF), we extract the selected object boundary from the segmentation result. Experiments were conducted on two typical datasets with diverse region object types from complex scenes. The results demonstrate the effectiveness of the proposed method, and our approach outperforms existing methods for interactive image segmentation.

Download Full-text

SEMANTIC SEGMENTATION OF AERIAL IMAGERY VIA MULTI-SCALE SHUFFLING CONVOLUTIONAL NEURAL NETWORKS WITH DEEP SUPERVISION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-1-29-2018 ◽

2018 ◽

Vol IV-1 ◽

pp. 29-36 ◽

Cited By ~ 4

Author(s):

K. Chen ◽

M. Weinmann ◽

X. Sun ◽

M. Yan ◽

S. Hinz ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Geometric Features ◽

Feature Maps ◽

Multi Scale ◽

Intermediate Layers ◽

Segmentation Task ◽

The Impact

<p><strong>Abstract.</strong> In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.</p>

Download Full-text