Towards Visible and Thermal Drone Monitoring with Convolutional Neural Networks

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.30 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 1

Author(s):

Ye Wang ◽

Yueru Chen ◽

Jongmoo Choi ◽

C.-C. Jay Kuo

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Tracking System ◽

Synthetic Data ◽

The Public ◽

Detection And Tracking ◽

Individual Module ◽

Augmentation Techniques ◽

Bounding Boxes ◽

Integrated Detection

This paper reports a visible and thermal drone monitoring system that integrates deep-learning-based detection and tracking modules. The biggest challenge in adopting deep learning methods for drone detection is the paucity of training drone images especially thermal drone images. To address this issue, we develop two data augmentation techniques. One is a model-based drone augmentation technique that automatically generates visible drone images with a bounding box label on the drone's location. The other is exploiting an adversarial data augmentation methodology to create thermal drone images. To track a small flying drone, we utilize the residual information between consecutive image frames. Finally, we present an integrated detection and tracking system that outperforms the performance of each individual module containing detection or tracking only. The experiments show that, even being trained on synthetic data, the proposed system performs well on real-world drone images with complex background. The USC drone detection and tracking dataset with user labeled bounding boxes is available to the public.

Download Full-text

Recent Data Augmentation Strategies for Deep Learning in Plant Phenotyping and Their Significance

10.31224/osf.io/t3q5p ◽

2020 ◽

Author(s):

Douglas Pinto Sampaio Gomes ◽

Lihong Zheng

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Plant Traits ◽

Synthetic Data ◽

Plant Phenotyping ◽

Data Sets ◽

Data Set ◽

Augmentation Strategies ◽

Augmentation Techniques ◽

Benchmark Solutions

Plant phenotyping concerns the study of plant traits resulted from their interaction with their environment. Computer vision (CV) techniques represent promising, non-invasive approaches for related tasks such as leaf counting, defining leaf area, and tracking plant growth. Between potential CV techniques, deep learning has been prevalent in the last couple of years. Such an increase in interest happened mainly due to the release of a data set containing rosette plants that defined objective metrics to benchmark solutions. This paper discusses an interesting aspect of the recent best-performing works in this field: the fact that their main contribution comes from novel data augmentation techniques, rather than model improvements. Moreover, experiments are set to highlight the significance of data augmentation practices for limited data sets with narrow distributions. This paper intends to review the ingenious techniques to generate synthetic data to augment training and display evidence of their potential importance.

Download Full-text

A unified framework for automated person re-indentification

Transport and Communication Science Journal ◽

10.25073/tcsj.71.7.11 ◽

2020 ◽

Vol 71 (7) ◽

pp. 868-880

Author(s):

Nguyen Hong-Quan ◽

Nguyen Thuy-Binh ◽

Tran Duc-Long ◽

Le Thi-Lan

Keyword(s):

Deep Learning ◽

Video Analysis ◽

Camera Network ◽

Unified Framework ◽

Person Detection ◽

Practical Applications ◽

Detection And Tracking ◽

Analysis System ◽

Bounding Boxes

Along with the strong development of camera networks, a video analysis system has been become more and more popular and has been applied in various practical applications. In this paper, we focus on person re-identification (person ReID) task that is a crucial step of video analysis systems. The purpose of person ReID is to associate multiple images of a given person when moving in a non-overlapping camera network. Many efforts have been made to person ReID. However, most of studies on person ReID only deal with well-alignment bounding boxes which are detected manually and considered as the perfect inputs for person ReID. In fact, when building a fully automated person ReID system the quality of the two previous steps that are person detection and tracking may have a strong effect on the person ReID performance. The contribution of this paper are two-folds. First, a unified framework for person ReID based on deep learning models is proposed. In this framework, the coupling of a deep neural network for person detection and a deep-learning-based tracking method is used. Besides, features extracted from an improved ResNet architecture are proposed for person representation to achieve a higher ReID accuracy. Second, our self-built dataset is introduced and employed for evaluation of all three steps in the fully automated person ReID framework.

Download Full-text

Image Data Augmentation techniques for Deep Learning -A Mirror Review

10.1109/icrito51393.2021.9596262 ◽

2021 ◽

Author(s):

Dipen Saini ◽

Rahul Malik

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Image Data ◽

Augmentation Techniques

Download Full-text

Lesion Detection in Breast Tomosynthesis Using Efficient Deep Learning and Data Augmentation Techniques

10.3233/faia210150 ◽

2021 ◽

Author(s):

Loay Hassan ◽

Mohamed Abedl-Nasser ◽

Adel Saleh ◽

Domenec Puig

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Breast Lesion ◽

Data Augmentation ◽

Digital Breast Tomosynthesis ◽

Lesion Detection ◽

Detection Methods ◽

Breast Tomosynthesis ◽

Mammographic Images ◽

Augmentation Techniques

Digital breast tomosynthesis (DBT) is one of the powerful breast cancer screening technologies. DBT can improve the ability of radiologists to detect breast cancer, especially in the case of dense breasts, where it beats mammography. Although many automated methods were proposed to detect breast lesions in mammographic images, very few methods were proposed for DBT due to the unavailability of enough annotated DBT images for training object detectors. In this paper, we present fully automated deep-learning breast lesion detection methods. Specifically, we study the effectiveness of two data augmentation techniques (channel replication and channel-concatenation) with five state-of-the-art deep learning detection models. Our preliminary results on a challenging publically available DBT dataset showed that the channel-concatenation data augmentation technique can significantly improve the breast lesion detection results for deep learning-based breast lesion detectors.

Download Full-text

Application of Data Augmentation Techniques for Hate Speech Detection with Deep Learning

10.1007/978-3-030-86230-5_61 ◽

2021 ◽

pp. 778-787

Author(s):

Lígia Iunes Venturott ◽

Patrick Marques Ciarelli

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Hate Speech ◽

Speech Detection ◽

Augmentation Techniques

Download Full-text

OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor Scenes

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2019-0061 ◽

2019 ◽

Vol 5 (1) ◽

pp. 239-244

Author(s):

Jingrui Yu ◽

Roman Seidel ◽

Gangolf Hirtz

Keyword(s):

Data Augmentation ◽

Training Data ◽

Fine Tuning ◽

Single Shot ◽

Augmentation Techniques ◽

Indoor Scenes ◽

One Step ◽

Bounding Boxes ◽

Omnidirectional Images ◽

Fine Tune

AbstractWe propose a one-step person detector for topview omnidirectional indoor scenes based on convolutional neural networks (CNNs). While state of the art person detectors reach competitive results on perspective images, missing CNN architectures as well as training data that follows the distortion of omnidirectional images makes current approaches not applicable to our data. The method predicts bounding boxes of multiple persons directly in omnidirectional images without perspective transformation, which reduces overhead of pre- and post-processing and enables realtime performance. The basic idea is to utilize transfer learning to fine-tune CNNs trained on perspective images with data augmentation techniques for detection in omnidirectional images. We fine-tune two variants of Single Shot MultiBox detectors (SSDs). The first one uses Mobilenet v1 FPN as feature extractor (moSSD). The second one uses ResNet50 v1 FPN (resSSD). Both models are pre-trained on Microsoft Common Objects in Context (COCO) dataset. We fine-tune both models on PASCAL VOC07 and VOC12 datasets, specifically on class person. Random 90-degree rotation and random vertical flipping are used for data augmentation in addition to the methods proposed by original SSD. We reach an average precision (AP) of 67.3%with moSSD and 74.9%with resSSD on the evaluation dataset. To enhance the fine-tuning process, we add a subset of HDA Person dataset and a subset of PIROPO database and reduce the number of perspective images to PASCAL VOC07. The AP rises to 83.2% for moSSD and 86.3% for resSSD, respectively. The average inference speed is 28 ms per image for moSSD and 38 ms per image for resSSD using Nvidia Quadro P6000. Our method is applicable to other CNN-based object detectors and can potentially generalize for detecting other objects in omnidirectional images.

Download Full-text

A Review on Finding Efficient Approach to Detect Customer Emotion Analysis using Deep Learning Analysis

Journal of Trends in Computer Science and Smart Technology - September 2019 ◽

10.36548/jtcsst.2021.2.003 ◽

2021 ◽

Vol 3 (2) ◽

pp. 95-113

Author(s):

Kottilingam Kottursamy

Keyword(s):

Deep Learning ◽

Facial Expression Recognition ◽

Data Augmentation ◽

Expression Recognition ◽

Generalization Error ◽

Augmentation Techniques ◽

Parallel Feature ◽

Improved Accuracy ◽

Learning Analysis

The role of facial expression recognition in social science and human-computer interaction has received a lot of attention. Deep learning advancements have resulted in advances in this field, which go beyond human-level accuracy. This article discusses various common deep learning algorithms for emotion recognition, all while utilising the eXnet library for achieving improved accuracy. Memory and computation, on the other hand, have yet to be overcome. Overfitting is an issue with large models. One solution to this challenge is to reduce the generalization error. We employ a novel Convolutional Neural Network (CNN) named eXnet to construct a new CNN model utilising parallel feature extraction. The most recent eXnet (Expression Net) model improves on the previous model's inaccuracy while having many fewer parameters. Data augmentation techniques that have been in use for decades are being utilized with the generalized eXnet. It employs effective ways to reduce overfitting while maintaining overall size under control.

Download Full-text

Fire Detection Method in Smart City Environments Using a Deep-Learning-Based Approach

Electronics ◽

10.3390/electronics11010073 ◽

2021 ◽

Vol 11 (1) ◽

pp. 73

Author(s):

Kuldoshbay Avazov ◽

Mukhriddin Mukhiddinov ◽

Fazliddin Makhmudov ◽

Young Im Cho

Keyword(s):

Deep Learning ◽

Urban Areas ◽

High Speed ◽

Data Augmentation ◽

Smart Cities ◽

Fire Detection ◽

Training Dataset ◽

Digital Cameras ◽

Augmentation Techniques ◽

Fire Accidents

In the construction of new smart cities, traditional fire-detection systems can be replaced with vision-based systems to establish fire safety in society using emerging technologies, such as digital cameras, computer vision, artificial intelligence, and deep learning. In this study, we developed a fire detector that accurately detects even small sparks and sounds an alarm within 8 s of a fire outbreak. A novel convolutional neural network was developed to detect fire regions using an enhanced You Only Look Once (YOLO) v4network. Based on the improved YOLOv4 algorithm, we adapted the network to operate on the Banana Pi M3 board using only three layers. Initially, we examined the originalYOLOv4 approach to determine the accuracy of predictions of candidate fire regions. However, the anticipated results were not observed after several experiments involving this approach to detect fire accidents. We improved the traditional YOLOv4 network by increasing the size of the training dataset based on data augmentation techniques for the real-time monitoring of fire disasters. By modifying the network structure through automatic color augmentation, reducing parameters, etc., the proposed method successfully detected and notified the incidence of disastrous fires with a high speed and accuracy in different weather environments—sunny or cloudy, day or night. Experimental results revealed that the proposed method can be used successfully for the protection of smart cities and in monitoring fires in urban areas. Finally, we compared the performance of our method with that of recently reported fire-detection approaches employing widely used performance matrices to test the fire classification results achieved.

Download Full-text

Making Radiomics More Reproducible across Scanner and Imaging Protocol Variations: A Review of Harmonization Methods

Journal of Personalized Medicine ◽

10.3390/jpm11090842 ◽

2021 ◽

Vol 11 (9) ◽

pp. 842

Author(s):

Shruti Atul Mali ◽

Abdalla Ibrahim ◽

Henry C. Woodruff ◽

Vincent Andrearczyk ◽

Henning Müller ◽

...

Keyword(s):

Deep Learning ◽

Domain Knowledge ◽

Data Augmentation ◽

Image Data ◽

Clinical Decision ◽

Generative Adversarial Networks ◽

Imaging Protocol ◽

Image Domain ◽

Style Transfer ◽

Augmentation Techniques

Radiomics converts medical images into mineable data via a high-throughput extraction of quantitative features used for clinical decision support. However, these radiomic features are susceptible to variation across scanners, acquisition protocols, and reconstruction settings. Various investigations have assessed the reproducibility and validation of radiomic features across these discrepancies. In this narrative review, we combine systematic keyword searches with prior domain knowledge to discuss various harmonization solutions to make the radiomic features more reproducible across various scanners and protocol settings. Different harmonization solutions are discussed and divided into two main categories: image domain and feature domain. The image domain category comprises methods such as the standardization of image acquisition, post-processing of raw sensor-level image data, data augmentation techniques, and style transfer. The feature domain category consists of methods such as the identification of reproducible features and normalization techniques such as statistical normalization, intensity harmonization, ComBat and its derivatives, and normalization using deep learning. We also reflect upon the importance of deep learning solutions for addressing variability across multi-centric radiomic studies especially using generative adversarial networks (GANs), neural style transfer (NST) techniques, or a combination of both. We cover a broader range of methods especially GANs and NST methods in more detail than previous reviews.

Download Full-text

3dinfogan: 3d Models’ Reconstruction In Infogans

Asia-Pacific Journal of Information Technology and Multimedia ◽

10.17576/apjitm-2021-1002-07 ◽

2021 ◽

Vol 10 (02) ◽

pp. 95-109

Author(s):

Du Chunqi ◽

Shinobu Hasegawa

Keyword(s):

Computer Vision ◽

Deep Learning ◽

Latent Variables ◽

Data Augmentation ◽

Synthetic Data ◽

Real Data ◽

3D Models ◽

High Quality ◽

Shape Constraint ◽

Real Objects

In computer vision and computer graphics, 3D reconstruction is the process of capturing real objects’ shapes and appearances. 3D models always can be constructed by active methods which use high-quality scanner equipment, or passive methods that learn from the dataset. However, both of these two methods only aimed to construct the 3D models, without showing what element affects the generation of 3D models. Therefore, the goal of this research is to apply deep learning to automatically generating 3D models, and finding the latent variables which affect the reconstructing process. The existing research GANs can be trained in little data with two networks called Generator and Discriminator, respectively. Generator can produce synthetic data, and Discriminator can discriminate between the generator’s output and real data. The existing research shows that InFoGAN can maximize the mutual information between latent variables and observation. In our approach, we will generate the 3D models based on InFoGAN and design two constraints, shape-constraint and parameters-constraint, respectively. Shape-constraint utilizes the data augmentation method to limit the synthetic data generated in the models’ profiles. At the same time, we also try to employ parameters-constraint to find the 3D models’ relationship corresponding to the latent variables. Furthermore, our approach will be a challenge in the architecture of generating 3D models built on InFoGAN. Finally, in the process of generation, we might discover the contribution of the latent variables influencing the 3D models to the whole network.

Download Full-text