Foreground Objects Detection by U-Net with Multiple Difference Images

Jae-Yeul Kim; Jong-Eun Ha

doi:10.3390/app11041807

Foreground Objects Detection by U-Net with Multiple Difference Images

Applied Sciences ◽

10.3390/app11041807 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1807

Author(s):

Jae-Yeul Kim ◽

Jong-Eun Ha

Keyword(s):

Deep Learning ◽

Video Surveillance ◽

Dramatic Improvement ◽

Robust Detection ◽

Training Conditions ◽

Difference Image ◽

Training Samples ◽

Traditional Approaches ◽

Difference Images ◽

Multiple Difference

In video surveillance, robust detection of foreground objects is usually done by subtracting a background model from the current image. Most traditional approaches use a statistical method to model the background image. Recently, deep learning has also been widely used to detect foreground objects in video surveillance. It shows dramatic improvement compared to the traditional approaches. It is trained through supervised learning, which requires training samples with pixel-level assignment. It requires a huge amount of time and is high cost, while traditional algorithms operate unsupervised and do not require training samples. Additionally, deep learning-based algorithms lack generalization power. They operate well on scenes that are similar to the training conditions, but they do not operate well on scenes that deviate from the training conditions. In this paper, we present a new method to detect foreground objects in video surveillance using multiple difference images as the input of convolutional neural networks, which guarantees improved generalization power compared to current deep learning-based methods. First, we adjust U-Net to use multiple difference images as input. Second, we show that training using all scenes in the CDnet 2014 dataset can improve the generalization power. Hyper-parameters such as the number of difference images and the interval between images in difference image computation are chosen by analyzing experimental results. We demonstrate that the proposed algorithm achieves improved performance in scenes that are not used in training compared to state-of-the-art deep learning and traditional unsupervised algorithms. Diverse experiments using various open datasets and real images show the feasibility of the proposed method.

Download Full-text

Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare

Applied Sciences ◽

10.3390/app9224749 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4749

Author(s):

Lingyun Jiang ◽

Kai Qiao ◽

Linyuan Wang ◽

Chi Zhang ◽

Jian Chen ◽

...

Keyword(s):

Deep Learning ◽

Human Brain ◽

Brain Activity ◽

Feature Space ◽

Training Data ◽

Reconstruction Method ◽

Learning Method ◽

Training Samples ◽

Visual Reconstruction ◽

Relationship Of

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.

Download Full-text

Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network

Sensors ◽

10.3390/s21093046 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3046

Author(s):

Shervin Minaee ◽

Mehdi Minaei ◽

Amirali Abdolrashidi

Keyword(s):

Deep Learning ◽

Facial Expression ◽

Facial Expression Recognition ◽

Expression Recognition ◽

Visualization Technique ◽

Convolutional Network ◽

The Past ◽

Multiple Datasets ◽

The Face ◽

Traditional Approaches

Facial expression recognition has been an active area of research over the past few decades, and it is still challenging due to the high intra-class variation. Traditional approaches for this problem rely on hand-crafted features such as SIFT, HOG, and LBP, followed by a classifier trained on a database of images or videos. Most of these works perform reasonably well on datasets of images captured in a controlled condition but fail to perform as well on more challenging datasets with more image variation and partial faces. In recent years, several works proposed an end-to-end framework for facial expression recognition using deep learning models. Despite the better performance of these works, there are still much room for improvement. In this work, we propose a deep learning approach based on attentional convolutional network that is able to focus on important parts of the face and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique that is able to find important facial regions to detect different emotions based on the classifier’s output. Through experimental results, we show that different emotions are sensitive to different parts of the face.

Download Full-text

Automatic Handgun Detection with Deep Learning in Video Surveillance Images

Applied Sciences ◽

10.3390/app11136085 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6085

Author(s):

Jesus Salido ◽

Vanesa Lomas ◽

Jesus Ruiz-Santaquiteria ◽

Oscar Deniz

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Video Surveillance ◽

Automatic Detection ◽

Public Spaces ◽

Detection Methods ◽

Training Dataset ◽

Average Precision ◽

Terrorist Acts

There is a great need to implement preventive mechanisms against shootings and terrorist acts in public spaces with a large influx of people. While surveillance cameras have become common, the need for monitoring 24/7 and real-time response requires automatic detection methods. This paper presents a study based on three convolutional neural network (CNN) models applied to the automatic detection of handguns in video surveillance images. It aims to investigate the reduction of false positives by including pose information associated with the way the handguns are held in the images belonging to the training dataset. The results highlighted the best average precision (96.36%) and recall (97.23%) obtained by RetinaNet fine-tuned with the unfrozen ResNet-50 backbone and the best precision (96.23%) and F1 score values (93.36%) obtained by YOLOv3 when it was trained on the dataset including pose information. This last architecture was the only one that showed a consistent improvement—around 2%—when pose information was expressly considered during training.

Download Full-text

An Explainable Deep Learning Ensemble Model for Robust Diagnosis of Diabetic Retinopathy Grading

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3469841 ◽

2021 ◽

Vol 17 (3s) ◽

pp. 1-24

Author(s):

Mohammad Shorfuzzaman ◽

M. Shamim Hossain ◽

Abdulmotaleb El Saddik

Keyword(s):

Diabetic Retinopathy ◽

Deep Learning ◽

Vision Loss ◽

Final Diagnosis ◽

Learning Rate ◽

Ensemble Model ◽

Fundus Images ◽

Robust Detection ◽

Retinal Fundus Images ◽

Retinal Fundus

Diabetic retinopathy (DR) is one of the most common causes of vision loss in people who have diabetes for a prolonged period. Convolutional neural networks (CNNs) have become increasingly popular for computer-aided DR diagnosis using retinal fundus images. While these CNNs are highly reliable, their lack of sufficient explainability prevents them from being widely used in medical practice. In this article, we propose a novel explainable deep learning ensemble model where weights from different models are fused into a single model to extract salient features from various retinal lesions found on fundus images. The extracted features are then fed to a custom classifier for the final diagnosis of DR severity level. The model is trained on an APTOS dataset containing retinal fundus images of various DR grades using a cyclical learning rates strategy with an automatic learning rate finder for decaying the learning rate to improve model accuracy. We develop an explainability approach by leveraging gradient-weighted class activation mapping and shapely adaptive explanations to highlight the areas of fundus images that are most indicative of different DR stages. This allows ophthalmologists to view our model's decision in a way that they can understand. Evaluation results using three different datasets (APTOS, MESSIDOR, IDRiD) show the effectiveness of our model, achieving superior classification rates with a high degree of precision (0.970), sensitivity (0.980), and AUC (0.978). We believe that the proposed model, which jointly offers state-of-the-art diagnosis performance and explainability, will address the black-box nature of deep CNN models in robust detection of DR grading.

Download Full-text

Astrid

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436907 ◽

2020 ◽

Vol 14 (4) ◽

pp. 471-484

Author(s):

Suraj Shetiya ◽

Saravanan Thirumuruganathan ◽

Nick Koudas ◽

Gautam Das

Keyword(s):

Deep Learning ◽

Objective Function ◽

Pattern Matching ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Selectivity Estimation ◽

Statistical Correlations ◽

Benchmark Datasets ◽

Traditional Approaches

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.

Download Full-text

Deep learning for robust detection of interictal epileptiform discharges

Journal of Neural Engineering ◽

10.1088/1741-2552/abf28e ◽

2021 ◽

Author(s):

David Geng ◽

Ayham Alkhachroum ◽

Manuel Melo Bicchi ◽

Jonathan Jagid ◽

Iahn Cajigas ◽

...

Keyword(s):

Deep Learning ◽

Robust Detection ◽

Epileptiform Discharges ◽

Interictal Epileptiform Discharges

Download Full-text

Detecting and preventing criminal activities in shopping malls using massive video surveillance based on deep learning models

Annals of Operations Research ◽

10.1007/s10479-021-04264-0 ◽

2021 ◽

Author(s):

Zhihong Qin ◽

Huixia Liu ◽

Bing Song ◽

Mamoun Alazab ◽

Priyan Malarvizhi Kumar

Keyword(s):

Deep Learning ◽

Video Surveillance ◽

Shopping Malls ◽

Learning Models

Download Full-text

Deep Learning in Smart Video Surveillance for Crowd Management: A Systematic Literature Review

10.1145/3473141.3473240 ◽

2021 ◽

Author(s):

Andrea Camille Garcia ◽

Jealine Eleanor Gorre ◽

Joshua Angelo Karl Perez ◽

Mary Jane Samonte

Keyword(s):

Deep Learning ◽

Literature Review ◽

Video Surveillance ◽

Systematic Literature Review ◽

Crowd Management

Download Full-text

A Novel Parallel Auto-Encoder Framework for Multi-Scale Data in Civil Structural Health Monitoring

Algorithms ◽

10.3390/a11080112 ◽

2018 ◽

Vol 11 (8) ◽

pp. 112 ◽

Cited By ~ 4

Author(s):

Ruhua Wang ◽

Ling Li ◽

Jun Li

Keyword(s):

Deep Learning ◽

Dimension Reduction ◽

Structural Damage ◽

Natural Frequencies ◽

Steel Structure ◽

Mode Shapes ◽

Multi Scale ◽

Training Samples ◽

Sparsity Constraint ◽

Deep Learning Network

In this paper, damage detection/identification for a seven-storey steel structure is investigated via using the vibration signals and deep learning techniques. Vibration characteristics, such as natural frequencies and mode shapes are captured and utilized as input for a deep learning network while the output vector represents the structural damage associated with locations. The deep auto-encoder with sparsity constraint is used for effective feature extraction for different types of signals and another deep auto-encoder is used to learn the relationship of different signals for final regression. The existing SAF model in a recent research study for the same problem processed all signals in one serial auto-encoder model. That kind of models have the following difficulties: (1) the natural frequencies and mode shapes are in different magnitude scales and it is not logical to normalize them in the same scale in building the models with training samples; (2) some frequencies and mode shapes may not be related to each other and it is not fair to use them for dimension reduction together. To tackle the above-mentioned problems for the multi-scale dataset in SHM, a novel parallel auto-encoder framework (Para-AF) is proposed in this paper. It processes the frequency signals and mode shapes separately for feature selection via dimension reduction and then combine these features together in relationship learning for regression. Furthermore, we introduce sparsity constraint in model reduction stage for performance improvement. Two experiments are conducted on performance evaluation and our results show the significant advantages of the proposed model in comparison with the existing approaches.

Download Full-text

Change Detection in Hyperspectral Images Using Recurrent 3D Fully Convolutional Networks

Remote Sensing ◽

10.3390/rs10111827 ◽

2018 ◽

Vol 10 (11) ◽

pp. 1827 ◽

Cited By ~ 24

Author(s):

Ahram Song ◽

Jaewan Choi ◽

Youkyung Han ◽

Yongil Kim

Keyword(s):

Deep Learning ◽

Change Detection ◽

Spatial Information ◽

Short Term Memory ◽

Hyperspectral Images ◽

Convolutional Network ◽

Ground Truth Data ◽

Fully Convolutional Network ◽

Training Samples ◽

Multi Temporal

Hyperspectral change detection (CD) can be effectively performed using deep-learning networks. Although these approaches require qualified training samples, it is difficult to obtain ground-truth data in the real world. Preserving spatial information during training is difficult due to structural limitations. To solve such problems, our study proposed a novel CD method for hyperspectral images (HSIs), including sample generation and a deep-learning network, called the recurrent three-dimensional (3D) fully convolutional network (Re3FCN), which merged the advantages of a 3D fully convolutional network (FCN) and a convolutional long short-term memory (ConvLSTM). Principal component analysis (PCA) and the spectral correlation angle (SCA) were used to generate training samples with high probabilities of being changed or unchanged. The strategy assisted in training fewer samples of representative feature expression. The Re3FCN was mainly comprised of spectral–spatial and temporal modules. Particularly, a spectral–spatial module with a 3D convolutional layer extracts the spectral–spatial features from the HSIs simultaneously, whilst a temporal module with ConvLSTM records and analyzes the multi-temporal HSI change information. The study first proposed a simple and effective method to generate samples for network training. This method can be applied effectively to cases with no training samples. Re3FCN can perform end-to-end detection for binary and multiple changes. Moreover, Re3FCN can receive multi-temporal HSIs directly as input without learning the characteristics of multiple changes. Finally, the network could extract joint spectral–spatial–temporal features and it preserved the spatial structure during the learning process through the fully convolutional structure. This study was the first to use a 3D FCN and a ConvLSTM for the remote-sensing CD. To demonstrate the effectiveness of the proposed CD method, we performed binary and multi-class CD experiments. Results revealed that the Re3FCN outperformed the other conventional methods, such as change vector analysis, iteratively reweighted multivariate alteration detection, PCA-SCA, FCN, and the combination of 2D convolutional layers-fully connected LSTM.

Download Full-text