wUUNet: Advanced Fully Convolutional Neural Network for Multiclass Fire Segmentation

Vladimir Sergeevich Bochkov; Liliya Yurievna Kataeva

doi:10.3390/sym13010098

wUUNet: Advanced Fully Convolutional Neural Network for Multiclass Fire Segmentation

Symmetry ◽

10.3390/sym13010098 ◽

2021 ◽

Vol 13 (1) ◽

pp. 98

Author(s):

Vladimir Sergeevich Bochkov ◽

Liliya Yurievna Kataeva

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Input Data ◽

Final Section ◽

State Of The Art ◽

Sliding Window ◽

Input Image ◽

Gaussian Mixtures ◽

Flame Suppression ◽

Segmentation Task

This article describes an AI-based solution to multiclass fire segmentation. The flame contours are divided into red, yellow, and orange areas. This separation is necessary to identify the hottest regions for flame suppression. Flame objects can have a wide variety of shapes (convex and non-convex). In that case, the segmentation task is more applicable than object detection because the center of the fire is much more accurate and reliable information than the center of the bounding box and, therefore, can be used by robotics systems for aiming. The UNet model is used as a baseline for the initial solution because it is the best open-source convolutional neural network. There is no available open dataset for multiclass fire segmentation. Hence, a custom dataset was developed and used in the current study, including 6250 samples from 36 videos. We compared the trained UNet models with several configurations of input data. The first comparison is shown between the calculation schemes of fitting the frame to one window and obtaining non-intersected areas of sliding window over the input image. Secondarily, we chose the best main metric of the loss function (soft Dice and Jaccard). We addressed the problem of detecting flame regions at the boundaries of non-intersected regions, and introduced new combinational methods of obtaining output signal based on weighted summarization and Gaussian mixtures of half-intersected areas as a solution. In the final section, we present UUNet-concatenative and wUUNet models that demonstrate significant improvements in accuracy and are considered to be state-of-the-art. All models use the original UNet-backbone at the encoder layers (i.e., VGG16) to demonstrate the superiority of the proposed architectures. The results can be applied to many robotic firefighting systems.

Download Full-text

Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network

Sensors ◽

10.3390/s19081795 ◽

2019 ◽

Vol 19 (8) ◽

pp. 1795 ◽

Cited By ~ 5

Author(s):

Xiao Lin ◽

Dalila Sánchez-Escobedo ◽

Josep R. Casas ◽

Montse Pardàs

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Semantic Segmentation ◽

Depth Estimation ◽

Input Image ◽

Estimation Accuracy ◽

Single Task ◽

Qualitative And Quantitative ◽

Highly Correlated

Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.

Download Full-text

XCM: An Explainable Convolutional Neural Network for Multivariate Time Series Classification

Mathematics ◽

10.3390/math9233137 ◽

2021 ◽

Vol 9 (23) ◽

pp. 3137

Author(s):

Kevin Fauvel ◽

Tao Lin ◽

Véronique Masson ◽

Élisa Fromont ◽

Alexandre Termier

Keyword(s):

Neural Network ◽

Time Series ◽

Deep Learning ◽

Convolutional Neural Network ◽

Input Data ◽

State Of The Art ◽

Multivariate Time Series ◽

Learning Approach ◽

Multiple Domains ◽

Post Hoc

Multivariate Time Series (MTS) classification has gained importance over the past decade with the increase in the number of temporal datasets in multiple domains. The current state-of-the-art MTS classifier is a heavyweight deep learning approach, which outperforms the second-best MTS classifier only on large datasets. Moreover, this deep learning approach cannot provide faithful explanations as it relies on post hoc model-agnostic explainability methods, which could prevent its use in numerous applications. In this paper, we present XCM, an eXplainable Convolutional neural network for MTS classification. XCM is a new compact convolutional neural network which extracts information relative to the observed variables and time directly from the input data. Thus, XCM architecture enables a good generalization ability on both large and small datasets, while allowing the full exploitation of a faithful post hoc model-specific explainability method (Gradient-weighted Class Activation Mapping) by precisely identifying the observed variables and timestamps of the input data that are important for predictions. We first show that XCM outperforms the state-of-the-art MTS classifiers on both the large and small public UEA datasets. Then, we illustrate how XCM reconciles performance and explainability on a synthetic dataset and show that XCM enables a more precise identification of the regions of the input data that are important for predictions compared to the current deep learning MTS classifier also providing faithful explainability. Finally, we present how XCM can outperform the current most accurate state-of-the-art algorithm on a real-world application while enhancing explainability by providing faithful and more informative explanations.

Download Full-text

Bag of tricks for fabric defect detection based on Cascade R-CNN

Textile Research Journal ◽

10.1177/0040517520955229 ◽

2020 ◽

pp. 004051752095522

Author(s):

Feng Li ◽

Feng Li

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Defect Detection ◽

State Of The Art ◽

Detection Algorithm ◽

Input Image ◽

Defect Size ◽

Fabric Defect Detection ◽

Fabric Defect ◽

Single Input

In this paper, a bag of tricks is proposed to improve the precision of fabric defect detection. Although the general state-of-the-art convolutional neural network detection algorithm can achieve a better detection effect, in fact, the detection precision still has enough room to improve on fabric defect detection. Therefore, we propose three tricks to further improve the precision. Firstly, we use multiscale training, which scales the single input image into a number of images of different resolutions for training, so as to be able to adapt to the box distribution of different scales. Secondly, we use the dimension clusters method. By observing the distribution of the width and the height of the defect size in the fabric dataset, we find that the distribution of the defect size in the dataset is extremely unbalanced and the size span is large. We believe that the training results of the default prior boxes setting might not be optimal, so we conduct dimensional clustering for the width and height of the defect size of the dataset, so as to make the network model easier to learn. Thirdly, we use soft non-maximum suppression instead of traditional non-maximum suppression to avoid the situation that the same kinds of defect category in the dataset are overlapped and eliminated as repeated detection. With this bag of tricks, we effectively improve the precision of fabric defect detection by 8.9% mAP on the basis of the baseline of state-of-the-art convolutional neural network detection algorithm.

Download Full-text

A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning

Electronics ◽

10.3390/electronics8121417 ◽

2019 ◽

Vol 8 (12) ◽

pp. 1417 ◽

Cited By ~ 1

Author(s):

Shan He ◽

Yuanyao Lu

Keyword(s):

Neural Network ◽

Computer Vision ◽

Convolutional Neural Network ◽

Language Processing ◽

State Of The Art ◽

Input Image ◽

Simple Design ◽

Complete Conversion ◽

Practical Application ◽

Image Captioning

Image captioning is a comprehensive task in computer vision (CV) and natural language processing (NLP). It can complete conversion from image to text, that is, the algorithm automatically generates corresponding descriptive text according to the input image. In this paper, we present an end-to-end model that takes deep convolutional neural network (CNN) as the encoder and recurrent neural network (RNN) as the decoder. In order to get better image captioning extraction, we propose a highly modularized multi-branch CNN, which could increase accuracy while maintaining the number of hyper-parameters unchanged. This strategy provides a simply designed network consists of parallel sub-modules of the same structure. While traditional CNN goes deeper and wider to increase accuracy, our proposed method is more effective with a simple design, which is easier to optimize for practical application. Experiments are conducted on Flickr8k, Flickr30k and MSCOCO entities. Results demonstrate that our method achieves state of the art performances in terms of caption quality.

Download Full-text

CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5643 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2594-2601

Author(s):

Arjun Akula ◽

Shuai Wang ◽

Song-Chun Zhu

Keyword(s):

Neural Network ◽

State Of The Art ◽

Input Image ◽

Classification Model ◽

Learning Models ◽

Fault Line ◽

Semantic Level ◽

Explainable Ai ◽

Fault Lines ◽

Classification Category

We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class cpred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperforms the state-of-the-art explainable AI models. Our implementation is available at https://github.com/arjunakula/CoCoX

Download Full-text

Sparse-FCM and Deep Convolutional Neural Network for the segmentation and classification of acute lymphoblastic leukaemia

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2018-0213 ◽

2020 ◽

Vol 65 (6) ◽

pp. 759-773

Author(s):

Segu Praveena ◽

Sohan Pal Singh

Keyword(s):

Neural Network ◽

Acute Lymphoblastic Leukaemia ◽

Convolutional Neural Network ◽

Optimization Algorithm ◽

Lymphoblastic Leukaemia ◽

Input Image ◽

Deep Convolutional Neural Network ◽

Grey Wolf ◽

Deep Cnn

AbstractLeukaemia detection and diagnosis in advance is the trending topic in the medical applications for reducing the death toll of patients with acute lymphoblastic leukaemia (ALL). For the detection of ALL, it is essential to analyse the white blood cells (WBCs) for which the blood smear images are employed. This paper proposes a new technique for the segmentation and classification of the acute lymphoblastic leukaemia. The proposed method of automatic leukaemia detection is based on the Deep Convolutional Neural Network (Deep CNN) that is trained using an optimization algorithm, named Grey wolf-based Jaya Optimization Algorithm (GreyJOA), which is developed using the Grey Wolf Optimizer (GWO) and Jaya Optimization Algorithm (JOA) that improves the global convergence. Initially, the input image is applied to pre-processing and the segmentation is performed using the Sparse Fuzzy C-Means (Sparse FCM) clustering algorithm. Then, the features, such as Local Directional Patterns (LDP) and colour histogram-based features, are extracted from the segments of the pre-processed input image. Finally, the extracted features are applied to the Deep CNN for the classification. The experimentation evaluation of the method using the images of the ALL IDB2 database reveals that the proposed method acquired a maximal accuracy, sensitivity, and specificity of 0.9350, 0.9528, and 0.9389, respectively.

Download Full-text

Low-Rank and Sparse Based Deep-Fusion Convolutional Neural Network for Crowd Counting

Mathematical Problems in Engineering ◽

10.1155/2017/5046727 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Siqi Tang ◽

Zhisong Pan ◽

Xingyu Zhou

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Regression Method ◽

Low Rank ◽

Counting Method ◽

Direct Integral ◽

Crowd Counting ◽

Counting Methods ◽

Density Map

This paper proposes an accurate crowd counting method based on convolutional neural network and low-rank and sparse structure. To this end, we firstly propose an effective deep-fusion convolutional neural network to promote the density map regression accuracy. Furthermore, we figure out that most of the existing CNN based crowd counting methods obtain overall counting by direct integral of estimated density map, which limits the accuracy of counting. Instead of direct integral, we adopt a regression method based on low-rank and sparse penalty to promote accuracy of the projection from density map to global counting. Experiments demonstrate the importance of such regression process on promoting the crowd counting performance. The proposed low-rank and sparse based deep-fusion convolutional neural network (LFCNN) outperforms existing crowd counting methods and achieves the state-of-the-art performance.

Download Full-text

Performance Analysis of State of the Art Convolutional Neural Network Architectures in Bangla Handwritten Character Recognition

Pattern Recognition and Image Analysis ◽

10.1134/s1054661821010089 ◽

2021 ◽

Vol 31 (1) ◽

pp. 60-71

Author(s):

Tapotosh Ghosh ◽

Min-Ha-Zul Abedin ◽

Hasan Al Banna ◽

Nasirul Mumenin ◽

Mohammad Abu Yousuf

Keyword(s):

Neural Network ◽

Performance Analysis ◽

Convolutional Neural Network ◽

Character Recognition ◽

State Of The Art ◽

Network Architectures ◽

Handwritten Character Recognition ◽

Handwritten Character ◽

Neural Network Architectures

Download Full-text

PedNet: A Spatio-Temporal Deep Convolutional Neural Network for Pedestrian Segmentation

Journal of Imaging ◽

10.3390/jimaging4090107 ◽

2018 ◽

Vol 4 (9) ◽

pp. 107 ◽

Cited By ~ 5

Author(s):

Mohib Ullah ◽

Ahmed Mohammed ◽

Faouzi Alaya Cheikh

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Performance Metrics ◽

State Of The Art ◽

Temporal Information ◽

Feature Maps ◽

Current Frame ◽

Low Level ◽

Art Methods ◽

Spatio Temporal

Articulation modeling, feature extraction, and classification are the important components of pedestrian segmentation. Usually, these components are modeled independently from each other and then combined in a sequential way. However, this approach is prone to poor segmentation if any individual component is weakly designed. To cope with this problem, we proposed a spatio-temporal convolutional neural network named PedNet which exploits temporal information for spatial segmentation. The backbone of the PedNet consists of an encoder–decoder network for downsampling and upsampling the feature maps, respectively. The input to the network is a set of three frames and the output is a binary mask of the segmented regions in the middle frame. Irrespective of classical deep models where the convolution layers are followed by a fully connected layer for classification, PedNet is a Fully Convolutional Network (FCN). It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. The main characteristic of PedNet is its unique design where it performs segmentation on a frame-by-frame basis but it uses the temporal information from the previous and the future frame for segmenting the pedestrian in the current frame. Moreover, to combine the low-level features with the high-level semantic information learned by the deeper layers, we used long-skip connections from the encoder to decoder network and concatenate the output of low-level layers with the higher level layers. This approach helps to get segmentation map with sharp boundaries. To show the potential benefits of temporal information, we also visualized different layers of the network. The visualization showed that the network learned different information from the consecutive frames and then combined the information optimally to segment the middle frame. We evaluated our approach on eight challenging datasets where humans are involved in different activities with severe articulation (football, road crossing, surveillance). The most common CamVid dataset which is used for calculating the performance of the segmentation algorithm is evaluated against seven state-of-the-art methods. The performance is shown on precision/recall, F 1 , F 2 , and mIoU. The qualitative and quantitative results show that PedNet achieves promising results against state-of-the-art methods with substantial improvement in terms of all the performance metrics.

Download Full-text

Spam Detection on Social Media Using Semantic Convolutional Neural Network

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2018010102 ◽

2018 ◽

Vol 8 (1) ◽

pp. 12-26 ◽

Cited By ~ 16

Author(s):

Gauri Jain ◽

Manisha Sharma ◽

Basant Agarwal

Keyword(s):

Neural Network ◽

Social Media ◽

Convolutional Neural Network ◽

State Of The Art ◽

Spam Detection ◽

Learning Technology ◽

The Social ◽

Social Media Text ◽

Current Article ◽

Semantic Layer

This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.

Download Full-text