scholarly journals Towards Visible and Thermal Drone Monitoring with Convolutional Neural Networks

Author(s):  
Ye Wang ◽  
Yueru Chen ◽  
Jongmoo Choi ◽  
C.-C. Jay Kuo

This paper reports a visible and thermal drone monitoring system that integrates deep-learning-based detection and tracking modules. The biggest challenge in adopting deep learning methods for drone detection is the paucity of training drone images especially thermal drone images. To address this issue, we develop two data augmentation techniques. One is a model-based drone augmentation technique that automatically generates visible drone images with a bounding box label on the drone's location. The other is exploiting an adversarial data augmentation methodology to create thermal drone images. To track a small flying drone, we utilize the residual information between consecutive image frames. Finally, we present an integrated detection and tracking system that outperforms the performance of each individual module containing detection or tracking only. The experiments show that, even being trained on synthetic data, the proposed system performs well on real-world drone images with complex background. The USC drone detection and tracking dataset with user labeled bounding boxes is available to the public.

2020 ◽  
Author(s):  
Douglas Pinto Sampaio Gomes ◽  
Lihong Zheng

Plant phenotyping concerns the study of plant traits resulted from their interaction with their environment. Computer vision (CV) techniques represent promising, non-invasive approaches for related tasks such as leaf counting, defining leaf area, and tracking plant growth. Between potential CV techniques, deep learning has been prevalent in the last couple of years. Such an increase in interest happened mainly due to the release of a data set containing rosette plants that defined objective metrics to benchmark solutions. This paper discusses an interesting aspect of the recent best-performing works in this field: the fact that their main contribution comes from novel data augmentation techniques, rather than model improvements. Moreover, experiments are set to highlight the significance of data augmentation practices for limited data sets with narrow distributions. This paper intends to review the ingenious techniques to generate synthetic data to augment training and display evidence of their potential importance.


2020 ◽  
Vol 71 (7) ◽  
pp. 868-880
Author(s):  
Nguyen Hong-Quan ◽  
Nguyen Thuy-Binh ◽  
Tran Duc-Long ◽  
Le Thi-Lan

Along with the strong development of camera networks, a video analysis system has been become more and more popular and has been applied in various practical applications. In this paper, we focus on person re-identification (person ReID) task that is a crucial step of video analysis systems. The purpose of person ReID is to associate multiple images of a given person when moving in a non-overlapping camera network. Many efforts have been made to person ReID. However, most of studies on person ReID only deal with well-alignment bounding boxes which are detected manually and considered as the perfect inputs for person ReID. In fact, when building a fully automated person ReID system the quality of the two previous steps that are person detection and tracking may have a strong effect on the person ReID performance. The contribution of this paper are two-folds. First, a unified framework for person ReID based on deep learning models is proposed. In this framework, the coupling of a deep neural network for person detection and a deep-learning-based tracking method is used. Besides, features extracted from an improved ResNet architecture are proposed for person representation to achieve a higher ReID accuracy. Second, our self-built dataset is introduced and employed for evaluation of all three steps in the fully automated person ReID framework.


2021 ◽  
Author(s):  
Loay Hassan ◽  
Mohamed Abedl-Nasser ◽  
Adel Saleh ◽  
Domenec Puig

Digital breast tomosynthesis (DBT) is one of the powerful breast cancer screening technologies. DBT can improve the ability of radiologists to detect breast cancer, especially in the case of dense breasts, where it beats mammography. Although many automated methods were proposed to detect breast lesions in mammographic images, very few methods were proposed for DBT due to the unavailability of enough annotated DBT images for training object detectors. In this paper, we present fully automated deep-learning breast lesion detection methods. Specifically, we study the effectiveness of two data augmentation techniques (channel replication and channel-concatenation) with five state-of-the-art deep learning detection models. Our preliminary results on a challenging publically available DBT dataset showed that the channel-concatenation data augmentation technique can significantly improve the breast lesion detection results for deep learning-based breast lesion detectors.


2019 ◽  
Vol 5 (1) ◽  
pp. 239-244
Author(s):  
Jingrui Yu ◽  
Roman Seidel ◽  
Gangolf Hirtz

AbstractWe propose a one-step person detector for topview omnidirectional indoor scenes based on convolutional neural networks (CNNs). While state of the art person detectors reach competitive results on perspective images, missing CNN architectures as well as training data that follows the distortion of omnidirectional images makes current approaches not applicable to our data. The method predicts bounding boxes of multiple persons directly in omnidirectional images without perspective transformation, which reduces overhead of pre- and post-processing and enables realtime performance. The basic idea is to utilize transfer learning to fine-tune CNNs trained on perspective images with data augmentation techniques for detection in omnidirectional images. We fine-tune two variants of Single Shot MultiBox detectors (SSDs). The first one uses Mobilenet v1 FPN as feature extractor (moSSD). The second one uses ResNet50 v1 FPN (resSSD). Both models are pre-trained on Microsoft Common Objects in Context (COCO) dataset. We fine-tune both models on PASCAL VOC07 and VOC12 datasets, specifically on class person. Random 90-degree rotation and random vertical flipping are used for data augmentation in addition to the methods proposed by original SSD. We reach an average precision (AP) of 67.3%with moSSD and 74.9%with resSSD on the evaluation dataset. To enhance the fine-tuning process, we add a subset of HDA Person dataset and a subset of PIROPO database and reduce the number of perspective images to PASCAL VOC07. The AP rises to 83.2% for moSSD and 86.3% for resSSD, respectively. The average inference speed is 28 ms per image for moSSD and 38 ms per image for resSSD using Nvidia Quadro P6000. Our method is applicable to other CNN-based object detectors and can potentially generalize for detecting other objects in omnidirectional images.


Author(s):  
Kottilingam Kottursamy

The role of facial expression recognition in social science and human-computer interaction has received a lot of attention. Deep learning advancements have resulted in advances in this field, which go beyond human-level accuracy. This article discusses various common deep learning algorithms for emotion recognition, all while utilising the eXnet library for achieving improved accuracy. Memory and computation, on the other hand, have yet to be overcome. Overfitting is an issue with large models. One solution to this challenge is to reduce the generalization error. We employ a novel Convolutional Neural Network (CNN) named eXnet to construct a new CNN model utilising parallel feature extraction. The most recent eXnet (Expression Net) model improves on the previous model's inaccuracy while having many fewer parameters. Data augmentation techniques that have been in use for decades are being utilized with the generalized eXnet. It employs effective ways to reduce overfitting while maintaining overall size under control.


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 73
Author(s):  
Kuldoshbay Avazov ◽  
Mukhriddin Mukhiddinov ◽  
Fazliddin Makhmudov ◽  
Young Im Cho

In the construction of new smart cities, traditional fire-detection systems can be replaced with vision-based systems to establish fire safety in society using emerging technologies, such as digital cameras, computer vision, artificial intelligence, and deep learning. In this study, we developed a fire detector that accurately detects even small sparks and sounds an alarm within 8 s of a fire outbreak. A novel convolutional neural network was developed to detect fire regions using an enhanced You Only Look Once (YOLO) v4network. Based on the improved YOLOv4 algorithm, we adapted the network to operate on the Banana Pi M3 board using only three layers. Initially, we examined the originalYOLOv4 approach to determine the accuracy of predictions of candidate fire regions. However, the anticipated results were not observed after several experiments involving this approach to detect fire accidents. We improved the traditional YOLOv4 network by increasing the size of the training dataset based on data augmentation techniques for the real-time monitoring of fire disasters. By modifying the network structure through automatic color augmentation, reducing parameters, etc., the proposed method successfully detected and notified the incidence of disastrous fires with a high speed and accuracy in different weather environments—sunny or cloudy, day or night. Experimental results revealed that the proposed method can be used successfully for the protection of smart cities and in monitoring fires in urban areas. Finally, we compared the performance of our method with that of recently reported fire-detection approaches employing widely used performance matrices to test the fire classification results achieved.


2021 ◽  
Vol 11 (9) ◽  
pp. 842
Author(s):  
Shruti Atul Mali ◽  
Abdalla Ibrahim ◽  
Henry C. Woodruff ◽  
Vincent Andrearczyk ◽  
Henning Müller ◽  
...  

Radiomics converts medical images into mineable data via a high-throughput extraction of quantitative features used for clinical decision support. However, these radiomic features are susceptible to variation across scanners, acquisition protocols, and reconstruction settings. Various investigations have assessed the reproducibility and validation of radiomic features across these discrepancies. In this narrative review, we combine systematic keyword searches with prior domain knowledge to discuss various harmonization solutions to make the radiomic features more reproducible across various scanners and protocol settings. Different harmonization solutions are discussed and divided into two main categories: image domain and feature domain. The image domain category comprises methods such as the standardization of image acquisition, post-processing of raw sensor-level image data, data augmentation techniques, and style transfer. The feature domain category consists of methods such as the identification of reproducible features and normalization techniques such as statistical normalization, intensity harmonization, ComBat and its derivatives, and normalization using deep learning. We also reflect upon the importance of deep learning solutions for addressing variability across multi-centric radiomic studies especially using generative adversarial networks (GANs), neural style transfer (NST) techniques, or a combination of both. We cover a broader range of methods especially GANs and NST methods in more detail than previous reviews.


Author(s):  
Du Chunqi ◽  
Shinobu Hasegawa

In computer vision and computer graphics, 3D reconstruction is the process of capturing real objects’ shapes and appearances. 3D models always can be constructed by active methods which use high-quality scanner equipment, or passive methods that learn from the dataset. However, both of these two methods only aimed to construct the 3D models, without showing what element affects the generation of 3D models. Therefore, the goal of this research is to apply deep learning to automatically generating 3D models, and finding the latent variables which affect the reconstructing process. The existing research GANs can be trained in little data with two networks called Generator and Discriminator, respectively. Generator can produce synthetic data, and Discriminator can discriminate between the generator’s output and real data. The existing research shows that InFoGAN can maximize the mutual information between latent variables and observation. In our approach, we will generate the 3D models based on InFoGAN and design two constraints, shape-constraint and parameters-constraint, respectively. Shape-constraint utilizes the data augmentation method to limit the synthetic data generated in the models’ profiles. At the same time, we also try to employ parameters-constraint to find the 3D models’ relationship corresponding to the latent variables. Furthermore, our approach will be a challenge in the architecture of generating 3D models built on InFoGAN. Finally, in the process of generation, we might discover the contribution of the latent variables influencing the 3D models to the whole network.


Sign in / Sign up

Export Citation Format

Share Document