scholarly journals T-RexNet—A Hardware-Aware Neural Network for Real-Time Detection of Small Moving Objects

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1252
Author(s):  
Alessio Canepa ◽  
Edoardo Ragusa ◽  
Rodolfo Zunino ◽  
Paolo Gastaldo

This paper presents the T-RexNet approach to detect small moving objects in videos by using a deep neural network. T-RexNet combines the advantages of Single-Shot-Detectors with a specific feature-extraction network, thus overcoming the known shortcomings of Single-Shot-Detectors in detecting small objects. The deep convolutional neural network includes two parallel paths: the first path processes both the original picture, in gray-scale format, and differences between consecutive frames; in the second path, differences between a set of three consecutive frames is only handled. As compared with generic object detectors, the method limits the depth of the convolutional network to make it less sensible to high-level features and easier to train on small objects. The simple, Hardware-efficient architecture attains its highest accuracy in the presence of videos with static framing. Deploying our architecture on the NVIDIA Jetson Nano edge-device shows its suitability to embedded systems. To prove the effectiveness and general applicability of the approach, real-world tests assessed the method performances in different scenarios, namely, aerial surveillance with the WPAFB 2009 dataset, civilian surveillance using the Chinese University of Hong Kong (CUHK) Square dataset, and fast tennis-ball tracking, involving a custom dataset. Experimental results prove that T-RexNet is a valid, general solution to detect small moving objects, which outperforms in this task generic existing object-detection approaches. The method also compares favourably with application-specific approaches in terms of the accuracy vs. speed trade-off.

The various hurdles in machine learning are beaten by deep learning techniques and then the deep learning has gradually become preeminent in artificial intelligence. Deep learning uses neural networks to kindle decisions like humans. Deep learning flourished as an energetic approach and clarity marked its success in various domains. The study includes some dominant deep learning algorithms such as convolution neural network, fully convolutional network, autoencoder, and deep belief network to analyze the medical image and to detect and diagnose of cancer at an early stage. As early as the detection of cancer than to treat the disease is uncomplicated. Early diagnosis was particularly relevant for some cancers such as breast, skin, colon, and rectum, which prohibit the chance to grow and spread. Deep learning contributes to enhanced performance and better prediction in detection of cancer with medical images. The paper presents the study of a few deep learning software frameworks such as tensor flow, theano, caffe, torch, and keras. Tensor Flow provides excellent functionality for deep learning. Keras is a high-level neural network API that operates above on tensor flow or theano. The survey winds up by presenting several future avenues and open challenges that should be addressed by the researcher in the future.


Author(s):  
Gregory Malyshev ◽  
Vyacheslav Andreev ◽  
Olga Andreeva ◽  
Oleg Chistyakov ◽  
Dmitriy Sveshnikov

This article explores the capabilities of pretrained convolutional neural networks in relation to the problem of recognizing defects for which it is impossible to identify any abstract features. The results of training the convolutional neural network AlexNet and the fully connected classifier of the VGG16 network are compared. The efficiency of using a pretrained neural network in the problem of defect recognition is demonstrated. A graph of the change in the proportion of correctly recognized images in the process of training a fully connected classifier is presented. The article attempts to explain the efficiency of a fully connected neural network classifier trained on a critically small training dataset with images of defects. The work of a convolutional neural network with a fully connected classifier is investigated. The classifier allows for classification into five categories: «crack» type defects, «chip» type defects, «hole» type defects, «multi hole» type defects and «defect-free surface». The article provides examples of convolutional network activation channels, visualized for each of the five categories. The signs of defects on which the activation of the network channels takes place are formulated. The classification errors made by the network are analyzed. The article provides predictive probabilities, below which the result of the network operation can be considered doubtful. Practical recommendations for using the trained network are given.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


2021 ◽  
Vol 10 (7) ◽  
pp. 488
Author(s):  
Peng Li ◽  
Dezheng Zhang ◽  
Aziguli Wulamu ◽  
Xin Liu ◽  
Peng Chen

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ha Min Son ◽  
Wooho Jeon ◽  
Jinhyun Kim ◽  
Chan Yeong Heo ◽  
Hye Jin Yoon ◽  
...  

AbstractAlthough computer-aided diagnosis (CAD) is used to improve the quality of diagnosis in various medical fields such as mammography and colonography, it is not used in dermatology, where noninvasive screening tests are performed only with the naked eye, and avoidable inaccuracies may exist. This study shows that CAD may also be a viable option in dermatology by presenting a novel method to sequentially combine accurate segmentation and classification models. Given an image of the skin, we decompose the image to normalize and extract high-level features. Using a neural network-based segmentation model to create a segmented map of the image, we then cluster sections of abnormal skin and pass this information to a classification model. We classify each cluster into different common skin diseases using another neural network model. Our segmentation model achieves better performance compared to previous studies, and also achieves a near-perfect sensitivity score in unfavorable conditions. Our classification model is more accurate than a baseline model trained without segmentation, while also being able to classify multiple diseases within a single image. This improved performance may be sufficient to use CAD in the field of dermatology.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tom Struck ◽  
Javed Lindner ◽  
Arne Hollmann ◽  
Floyd Schauer ◽  
Andreas Schmidbauer ◽  
...  

AbstractEstablishing low-error and fast detection methods for qubit readout is crucial for efficient quantum error correction. Here, we test neural networks to classify a collection of single-shot spin detection events, which are the readout signal of our qubit measurements. This readout signal contains a stochastic peak, for which a Bayesian inference filter including Gaussian noise is theoretically optimal. Hence, we benchmark our neural networks trained by various strategies versus this latter algorithm. Training of the network with 106 experimentally recorded single-shot readout traces does not improve the post-processing performance. A network trained by synthetically generated measurement traces performs similar in terms of the detection error and the post-processing speed compared to the Bayesian inference filter. This neural network turns out to be more robust to fluctuations in the signal offset, length and delay as well as in the signal-to-noise ratio. Notably, we find an increase of 7% in the visibility of the Rabi oscillation when we employ a network trained by synthetic readout traces combined with measured signal noise of our setup. Our contribution thus represents an example of the beneficial role which software and hardware implementation of neural networks may play in scalable spin qubit processor architectures.


2021 ◽  
Vol 11 (3) ◽  
pp. 1223
Author(s):  
Ilshat Khasanshin

This work aimed to study the automation of measuring the speed of punches of boxers during shadow boxing using inertial measurement units (IMUs) based on an artificial neural network (ANN). In boxing, for the effective development of an athlete, constant control of the punch speed is required. However, even when using modern means of measuring kinematic parameters, it is necessary to record the circumstances under which the punch was performed: The type of punch (jab, cross, hook, or uppercut) and the type of activity (shadow boxing, single punch, or series of punches). Therefore, to eliminate errors and accelerate the process, that is, automate measurements, the use of an ANN in the form of a multilayer perceptron (MLP) is proposed. During the experiments, IMUs were installed on the boxers’ wrists. The input parameters of the ANN were the absolute acceleration and angular velocity. The experiment was conducted for three groups of boxers with different levels of training. The developed model showed a high level of punch recognition for all groups, and it can be concluded that the use of the ANN significantly accelerates the collection of data on the kinetic characteristics of boxers’ punches and allows this process to be automated.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4819
Author(s):  
Yikang Li ◽  
Zhenzhou Wang

Single-shot 3D reconstruction technique is very important for measuring moving and deforming objects. After many decades of study, a great number of interesting single-shot techniques have been proposed, yet the problem remains open. In this paper, a new approach is proposed to reconstruct deforming and moving objects with the structured light RGB line pattern. The structured light RGB line pattern is coded using parallel red, green, and blue lines with equal intervals to facilitate line segmentation and line indexing. A slope difference distribution (SDD)-based image segmentation method is proposed to segment the lines robustly in the HSV color space. A method of exclusion is proposed to index the red lines, the green lines, and the blue lines respectively and robustly. The indexed lines in different colors are fused to obtain a phase map for 3D depth calculation. The quantitative accuracies of measuring a calibration grid and a ball achieved by the proposed approach are 0.46 and 0.24 mm, respectively, which are significantly lower than those achieved by the compared state-of-the-art single-shot techniques.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 319
Author(s):  
Yi Wang ◽  
Xiao Song ◽  
Guanghong Gong ◽  
Ni Li

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.


Sign in / Sign up

Export Citation Format

Share Document