scholarly journals Foreground Objects Detection by U-Net with Multiple Difference Images

2021 ◽  
Vol 11 (4) ◽  
pp. 1807
Author(s):  
Jae-Yeul Kim ◽  
Jong-Eun Ha

In video surveillance, robust detection of foreground objects is usually done by subtracting a background model from the current image. Most traditional approaches use a statistical method to model the background image. Recently, deep learning has also been widely used to detect foreground objects in video surveillance. It shows dramatic improvement compared to the traditional approaches. It is trained through supervised learning, which requires training samples with pixel-level assignment. It requires a huge amount of time and is high cost, while traditional algorithms operate unsupervised and do not require training samples. Additionally, deep learning-based algorithms lack generalization power. They operate well on scenes that are similar to the training conditions, but they do not operate well on scenes that deviate from the training conditions. In this paper, we present a new method to detect foreground objects in video surveillance using multiple difference images as the input of convolutional neural networks, which guarantees improved generalization power compared to current deep learning-based methods. First, we adjust U-Net to use multiple difference images as input. Second, we show that training using all scenes in the CDnet 2014 dataset can improve the generalization power. Hyper-parameters such as the number of difference images and the interval between images in difference image computation are chosen by analyzing experimental results. We demonstrate that the proposed algorithm achieves improved performance in scenes that are not used in training compared to state-of-the-art deep learning and traditional unsupervised algorithms. Diverse experiments using various open datasets and real images show the feasibility of the proposed method.

2019 ◽  
Vol 9 (22) ◽  
pp. 4749
Author(s):  
Lingyun Jiang ◽  
Kai Qiao ◽  
Linyuan Wang ◽  
Chi Zhang ◽  
Jian Chen ◽  
...  

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3046
Author(s):  
Shervin Minaee ◽  
Mehdi Minaei ◽  
Amirali Abdolrashidi

Facial expression recognition has been an active area of research over the past few decades, and it is still challenging due to the high intra-class variation. Traditional approaches for this problem rely on hand-crafted features such as SIFT, HOG, and LBP, followed by a classifier trained on a database of images or videos. Most of these works perform reasonably well on datasets of images captured in a controlled condition but fail to perform as well on more challenging datasets with more image variation and partial faces. In recent years, several works proposed an end-to-end framework for facial expression recognition using deep learning models. Despite the better performance of these works, there are still much room for improvement. In this work, we propose a deep learning approach based on attentional convolutional network that is able to focus on important parts of the face and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique that is able to find important facial regions to detect different emotions based on the classifier’s output. Through experimental results, we show that different emotions are sensitive to different parts of the face.


2021 ◽  
Vol 11 (13) ◽  
pp. 6085
Author(s):  
Jesus Salido ◽  
Vanesa Lomas ◽  
Jesus Ruiz-Santaquiteria ◽  
Oscar Deniz

There is a great need to implement preventive mechanisms against shootings and terrorist acts in public spaces with a large influx of people. While surveillance cameras have become common, the need for monitoring 24/7 and real-time response requires automatic detection methods. This paper presents a study based on three convolutional neural network (CNN) models applied to the automatic detection of handguns in video surveillance images. It aims to investigate the reduction of false positives by including pose information associated with the way the handguns are held in the images belonging to the training dataset. The results highlighted the best average precision (96.36%) and recall (97.23%) obtained by RetinaNet fine-tuned with the unfrozen ResNet-50 backbone and the best precision (96.23%) and F1 score values (93.36%) obtained by YOLOv3 when it was trained on the dataset including pose information. This last architecture was the only one that showed a consistent improvement—around 2%—when pose information was expressly considered during training.


Author(s):  
Mohammad Shorfuzzaman ◽  
M. Shamim Hossain ◽  
Abdulmotaleb El Saddik

Diabetic retinopathy (DR) is one of the most common causes of vision loss in people who have diabetes for a prolonged period. Convolutional neural networks (CNNs) have become increasingly popular for computer-aided DR diagnosis using retinal fundus images. While these CNNs are highly reliable, their lack of sufficient explainability prevents them from being widely used in medical practice. In this article, we propose a novel explainable deep learning ensemble model where weights from different models are fused into a single model to extract salient features from various retinal lesions found on fundus images. The extracted features are then fed to a custom classifier for the final diagnosis of DR severity level. The model is trained on an APTOS dataset containing retinal fundus images of various DR grades using a cyclical learning rates strategy with an automatic learning rate finder for decaying the learning rate to improve model accuracy. We develop an explainability approach by leveraging gradient-weighted class activation mapping and shapely adaptive explanations to highlight the areas of fundus images that are most indicative of different DR stages. This allows ophthalmologists to view our model's decision in a way that they can understand. Evaluation results using three different datasets (APTOS, MESSIDOR, IDRiD) show the effectiveness of our model, achieving superior classification rates with a high degree of precision (0.970), sensitivity (0.980), and AUC (0.978). We believe that the proposed model, which jointly offers state-of-the-art diagnosis performance and explainability, will address the black-box nature of deep CNN models in robust detection of DR grading.


2020 ◽  
Vol 14 (4) ◽  
pp. 471-484
Author(s):  
Suraj Shetiya ◽  
Saravanan Thirumuruganathan ◽  
Nick Koudas ◽  
Gautam Das

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.


Author(s):  
David Geng ◽  
Ayham Alkhachroum ◽  
Manuel Melo Bicchi ◽  
Jonathan Jagid ◽  
Iahn Cajigas ◽  
...  

2021 ◽  
Author(s):  
Andrea Camille Garcia ◽  
Jealine Eleanor Gorre ◽  
Joshua Angelo Karl Perez ◽  
Mary Jane Samonte

Algorithms ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 112 ◽  
Author(s):  
Ruhua Wang ◽  
Ling Li ◽  
Jun Li

In this paper, damage detection/identification for a seven-storey steel structure is investigated via using the vibration signals and deep learning techniques. Vibration characteristics, such as natural frequencies and mode shapes are captured and utilized as input for a deep learning network while the output vector represents the structural damage associated with locations. The deep auto-encoder with sparsity constraint is used for effective feature extraction for different types of signals and another deep auto-encoder is used to learn the relationship of different signals for final regression. The existing SAF model in a recent research study for the same problem processed all signals in one serial auto-encoder model. That kind of models have the following difficulties: (1) the natural frequencies and mode shapes are in different magnitude scales and it is not logical to normalize them in the same scale in building the models with training samples; (2) some frequencies and mode shapes may not be related to each other and it is not fair to use them for dimension reduction together. To tackle the above-mentioned problems for the multi-scale dataset in SHM, a novel parallel auto-encoder framework (Para-AF) is proposed in this paper. It processes the frequency signals and mode shapes separately for feature selection via dimension reduction and then combine these features together in relationship learning for regression. Furthermore, we introduce sparsity constraint in model reduction stage for performance improvement. Two experiments are conducted on performance evaluation and our results show the significant advantages of the proposed model in comparison with the existing approaches.


2018 ◽  
Vol 10 (11) ◽  
pp. 1827 ◽  
Author(s):  
Ahram Song ◽  
Jaewan Choi ◽  
Youkyung Han ◽  
Yongil Kim

Hyperspectral change detection (CD) can be effectively performed using deep-learning networks. Although these approaches require qualified training samples, it is difficult to obtain ground-truth data in the real world. Preserving spatial information during training is difficult due to structural limitations. To solve such problems, our study proposed a novel CD method for hyperspectral images (HSIs), including sample generation and a deep-learning network, called the recurrent three-dimensional (3D) fully convolutional network (Re3FCN), which merged the advantages of a 3D fully convolutional network (FCN) and a convolutional long short-term memory (ConvLSTM). Principal component analysis (PCA) and the spectral correlation angle (SCA) were used to generate training samples with high probabilities of being changed or unchanged. The strategy assisted in training fewer samples of representative feature expression. The Re3FCN was mainly comprised of spectral–spatial and temporal modules. Particularly, a spectral–spatial module with a 3D convolutional layer extracts the spectral–spatial features from the HSIs simultaneously, whilst a temporal module with ConvLSTM records and analyzes the multi-temporal HSI change information. The study first proposed a simple and effective method to generate samples for network training. This method can be applied effectively to cases with no training samples. Re3FCN can perform end-to-end detection for binary and multiple changes. Moreover, Re3FCN can receive multi-temporal HSIs directly as input without learning the characteristics of multiple changes. Finally, the network could extract joint spectral–spatial–temporal features and it preserved the spatial structure during the learning process through the fully convolutional structure. This study was the first to use a 3D FCN and a ConvLSTM for the remote-sensing CD. To demonstrate the effectiveness of the proposed CD method, we performed binary and multi-class CD experiments. Results revealed that the Re3FCN outperformed the other conventional methods, such as change vector analysis, iteratively reweighted multivariate alteration detection, PCA-SCA, FCN, and the combination of 2D convolutional layers-fully connected LSTM.


Sign in / Sign up

Export Citation Format

Share Document