Real-Time Person Segmentation – Based on
Body Pix

Anish Mankotia and Meenu Garg

doi:10.46501/ijmtst061201

Real-Time Person Segmentation – Based on Body Pix

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst061201 ◽

2020 ◽

Vol 6 (12) ◽

pp. 1-7

Author(s):

Anish Mankotia and Meenu Garg

Keyword(s):

Neural Networks ◽

Real Time ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Semantic Segmentation ◽

The State ◽

The Body ◽

Surface Information ◽

Semantic Class ◽

Additional Information

In this paper, we propose a novel semantic segmentation-based on the body pix module of the Tensor flow.js which can keep up with the accuracy of the state-of-the art approaches while running in real time. The solution follows the convolutional neural networks, each step in the workflow being enhanced by additional information from semantic segmentation. Therefore, we introduce several improvements to computation, aggregation, and optimization by adapting existing techniques to integrate additional surface information given by each semantic class. Using the body pix model which is trained using CNN, the ResNET50, this network can work with more than 150 layers, removing the problem of vanishing gradients. Using this network our body pix module, creates a more accurate and defined segmentation, and also supports multi-person segmentation.

Download Full-text

Handwritten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional Neural Networks

Computational Intelligence and Neuroscience ◽

10.1155/2018/6747098 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 18

Author(s):

Md Zahangir Alom ◽

Paheding Sidike ◽

Mahmudul Hasan ◽

Tarek M. Taha ◽

Vijayan K. Asari

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Convolutional Neural Networks ◽

Character Recognition ◽

State Of The Art ◽

The State ◽

Superior Performance ◽

Deep Convolutional Neural Networks ◽

Practical Applications ◽

High Degree

In spite of advances in object recognition technology, handwritten Bangla character recognition (HBCR) remains largely unsolved due to the presence of many ambiguous handwritten characters and excessively cursive Bangla handwritings. Even many advanced existing methods do not lead to satisfactory performance in practice that related to HBCR. In this paper, a set of the state-of-the-art deep convolutional neural networks (DCNNs) is discussed and their performance on the application of HBCR is systematically evaluated. The main advantage of DCNN approaches is that they can extract discriminative features from raw data and represent them with a high degree of invariance to object distortions. The experimental results show the superior performance of DCNN models compared with the other popular object recognition approaches, which implies DCNN can be a good candidate for building an automatic HBCR system for practical applications.

Download Full-text

Using 3D Convolutional Neural Networks for Real-time Detection of Soccer Events

International Journal of Semantic Computing ◽

10.1142/s1793351x2140002x ◽

2021 ◽

Vol 15 (02) ◽

pp. 161-187

Author(s):

Olav A. Nergård Rongved ◽

Steven A. Hicks ◽

Vajira Thambawita ◽

Håkon K. Stensland ◽

Evi Zouganeli ◽

...

Keyword(s):

Neural Networks ◽

Real Time ◽

Convolutional Neural Networks ◽

Event Detection ◽

State Of The Art ◽

Time Estimation ◽

High Recall ◽

Current State ◽

Ablation Study ◽

Different Parts

Developing systems for the automatic detection of events in video is a task which has gained attention in many areas including sports. More specifically, event detection for soccer videos has been studied widely in the literature. However, there are still a number of shortcomings in the state-of-the-art such as high latency, making it challenging to operate at the live edge. In this paper, we present an algorithm to detect events in soccer videos in real time, using 3D convolutional neural networks. We test our algorithm on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.

Download Full-text

Charting the State-of-the-Art in the Application of Convolutional Neural Networks to Quality Control in Industry 4.0 and Smart Manufacturing

2019 10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) ◽

10.1109/coginfocom47531.2019.9089932 ◽

2019 ◽

Cited By ~ 1

Author(s):

Cristina Monsone ◽

Adam B. Csapo

Keyword(s):

Neural Networks ◽

Quality Control ◽

Convolutional Neural Networks ◽

Industry 4.0 ◽

State Of The Art ◽

The State ◽

Smart Manufacturing

Download Full-text

LdsConv: Learned Depthwise Separable Convolutions by Group Pruning

Sensors ◽

10.3390/s20154349 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4349

Author(s):

Wenxiang Lin ◽

Yan Ding ◽

Hua-Liang Wei ◽

Xinglin Pan ◽

Yutong Zhang

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Computational Cost ◽

The State ◽

Direct Replacement ◽

Improved Accuracy ◽

Pruning Technique ◽

Strong Capacity

Standard convolutional filters usually capture unnecessary overlap of features resulting in a waste of computational cost. In this paper, we aim to solve this problem by proposing a novel Learned Depthwise Separable Convolution (LdsConv) operation that is smart but has a strong capacity for learning. It integrates the pruning technique into the design of convolutional filters, formulated as a generic convolutional unit that can be used as a direct replacement of convolutions without any adjustments of the architecture. To show the effectiveness of the proposed method, experiments are carried out using the state-of-the-art convolutional neural networks (CNNs), including ResNet, DenseNet, SE-ResNet and MobileNet, respectively. The results show that by simply replacing the original convolution with LdsConv in these CNNs, it can achieve a significantly improved accuracy while reducing computational cost. For the case of ResNet50, the FLOPs can be reduced by 40.9%, meanwhile the accuracy on the associated ImageNet increases.

Download Full-text

DGA CapsNet: 1D Application of Capsule Networks to DGA Detection

Information ◽

10.3390/info10050157 ◽

2019 ◽

Vol 10 (5) ◽

pp. 157 ◽

Cited By ~ 5

Author(s):

Daniel S. Berman

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Real Time ◽

Convolutional Neural Networks ◽

Recurrent Neural Networks ◽

State Of The Art ◽

One Dimensional ◽

Domain Names ◽

Large Numbers ◽

And Control

Domain generation algorithms (DGAs) represent a class of malware used to generate large numbers of new domain names to achieve command-and-control (C2) communication between the malware program and its C2 server to avoid detection by cybersecurity measures. Deep learning has proven successful in serving as a mechanism to implement real-time DGA detection, specifically through the use of recurrent neural networks (RNNs) and convolutional neural networks (CNNs). This paper compares several state-of-the-art deep-learning implementations of DGA detection found in the literature with two novel models: a deeper CNN model and a one-dimensional (1D) Capsule Networks (CapsNet) model. The comparison shows that the 1D CapsNet model performs as well as the best-performing model from the literature.

Download Full-text

Optimizing 3D Convolution Kernels on Stereo Matching for Resource Efficient Computations

Sensors ◽

10.3390/s21206808 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6808

Author(s):

Jianqiang Xiao ◽

Dianbo Ma ◽

Satoshi Yamane

Keyword(s):

Neural Networks ◽

Computational Complexity ◽

Convolutional Neural Networks ◽

Stereo Matching ◽

State Of The Art ◽

Computational Cost ◽

The State ◽

Matching Network ◽

Convolution Kernels ◽

Low Computational Cost

Despite recent stereo matching algorithms achieving significant results on public benchmarks, the problem of requiring heavy computation remains unsolved. Most works focus on designing an architecture to reduce the computational complexity, while we take aim at optimizing 3D convolution kernels on the Pyramid Stereo Matching Network (PSMNet) for solving the problem. In this paper, we design a series of comparative experiments exploring the performance of well-known convolution kernels on PSMNet. Our model saves the computational complexity from 256.66G MAdd (Multiply-Add operations) to 69.03G MAdd (198.47G MAdd to 10.84G MAdd for only considering 3D convolutional neural networks) without losing accuracy. On Scene Flow and KITTI 2015 datasets, our model achieves results comparable to the state-of-the-art with a low computational cost.

Download Full-text

Towards Interpretable Semantic Segmentation via Gradient-Weighted Class Activation Mapping (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7244 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13943-13944

Author(s):

Kira Vinogradova ◽

Alexandr Dibrov ◽

Gene Myers

Keyword(s):

Neural Networks ◽

Image Segmentation ◽

Image Classification ◽

Convolutional Neural Networks ◽

Image Recognition ◽

State Of The Art ◽

Semantic Segmentation ◽

Wide Range ◽

Gradient Based ◽

Activation Mapping

Convolutional neural networks have become state-of-the-art in a wide range of image recognition tasks. The interpretation of their predictions, however, is an active area of research. Whereas various interpretation methods have been suggested for image classification, the interpretation of image segmentation still remains largely unexplored. To that end, we propose seg-grad-cam, a gradient-based method for interpreting semantic segmentation. Our method is an extension of the widely-used Grad-CAM method, applied locally to produce heatmaps showing the relevance of individual pixels for semantic segmentation.

Download Full-text

Common Kernels and Convolutions in Binary- and Ternary-Weight Neural Networks

Journal of Circuits System and Computers ◽

10.1142/s0218126621501589 ◽

2020 ◽

pp. 2150158

Author(s):

Byungmin Ahn ◽

Taewhan Kim

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Optimization Algorithm ◽

State Of The Art ◽

The State ◽

Memory Access ◽

Experimental Results ◽

Hardware Platform ◽

Access Latency ◽

The Common

A new algorithm for extracting common kernels and convolutions to maximally eliminate the redundant operations among the convolutions in binary- and ternary-weight convolutional neural networks is presented. Precisely, we propose (1) a new algorithm of common kernel extraction to overcome the local and limited exploration of common kernel candidates by the existing method, and subsequently apply (2) a new concept of common convolution extraction to maximally eliminate the redundancy in the convolution operations. In addition, our algorithm is able to (3) tune in minimizing the number of resulting kernels for convolutions, thereby saving the total memory access latency for kernels. Experimental results on ternary-weight VGG-16 demonstrate that our convolution optimization algorithm is very effective, reducing the total number of operations for all convolutions by [Formula: see text], thereby reducing the total number of execution cycles on hardware platform by 22.4% while using [Formula: see text] fewer kernels over that of the convolution utilizing the common kernels extracted by the state-of-the-art algorithm.

Download Full-text

Recurrent convolutional neural networks for poet identification

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz096 ◽

2020 ◽

Cited By ~ 1

Author(s):

Dariush Salami ◽

Saeedeh Momtazi

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

The State ◽

Support Vector ◽

Neural Network Models ◽

Spatial Features ◽

Input Text

Abstract Deep neural networks have been widely used in various language processing tasks. Recurrent neural networks (RNNs) and convolutional neural networks (CNN) are two common types of neural networks that have a successful history in capturing temporal and spatial features of texts. By using RNN, we can encode input text to a lower space of semantic features while considering the sequential behavior of words. By using CNN, we can transfer the representation of input text to a flat structure to be used for classifying text. In this article, we proposed a novel recurrent CNN model to capture not only the temporal but also the spatial features of the input poem/verse to be used for poet identification. Considering the shortcomings of the normal RNNs, we try both long short-term memory and gated recurrent unit units in the proposed architecture and apply them to the poet identification task. There are a large number of poems in the history of literature whose poets are unknown. Considering the importance of the task in the information processing field, a great variety of methods from traditional learning models, such as support vector machine and logistic regression, to deep neural network models, such as CNN, have been proposed to address this problem. Our experiments show that the proposed model significantly outperforms the state-of-the-art models for poet identification by receiving either a poem or a single verse as input. In comparison to the state-of-the-art CNN model, we achieved 9% and 4% improvements in f-measure for poem- and verse-based tasks, respectively.

Download Full-text

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/309 ◽

2018 ◽

Cited By ~ 79

Author(s):

Yang He ◽

Guoliang Kang ◽

Xuanyi Dong ◽

Yanwei Fu ◽

Yi Yang

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

The State ◽

Training Data ◽

Deep Convolutional Neural Networks ◽

Inference Procedure ◽

Accuracy Improvement ◽

Large Capacity ◽

Pruning Methods

This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model after pruning. SFP has two advantages over previous works: (1) Larger model capacity. Updating previously pruned filters provides our approach with larger optimization space than fixing the filters to zero. Therefore, the network trained by our method has a larger model capacity to learn from the training data. (2) Less dependence on the pretrained model. Large capacity enables SFP to train from scratch and prune the model simultaneously. In contrast, previous filter pruning methods should be conducted on the basis of the pre-trained model to guarantee their performance. Empirically, SFP from scratch outperforms the previous filter pruning methods. Moreover, our approach has been demonstrated effective for many advanced CNN architectures. Notably, on ILSCRC-2012, SFP reduces more than 42% FLOPs on ResNet-101 with even 0.2% top-5 accuracy improvement, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/softfilter-pruning

Download Full-text