scholarly journals Parallel Multi-Branch Convolution Block Net for Fast and Accurate Object Detection

Electronics ◽  
2019 ◽  
Vol 9 (1) ◽  
pp. 15 ◽  
Author(s):  
Lei Fu ◽  
Wenbin Gu ◽  
Lei He ◽  
Ting Rui ◽  
Liang Chen ◽  
...  

In order to maintain the high speed advantage of single-stage object detector and improve its detection accuracy, in this paper, we propose a parallel multi-branch convolution block, called PMCB, which can efficiently extract multi-scale object information at a specific layer to form a discriminative feature layer and boost the detection performance with little computational burden. Based on the PMCB module, we build PMCB Net on the basis of the single shot multibox detector (SSD) network by replacing the conventional convolution with PMCB at a specific layer. The performance of the proposed algorithm is compared with that of other state-of-the-art methods on PASCAL VOC2007, MS COCO test datasets. The experimental results show that the proposed algorithm greatly improved detection accuracy performance while only adding a negligible computational burden, which is very important for practical engineering applications.

2021 ◽  
Vol 7 (11) ◽  
pp. 223
Author(s):  
Gabriele Antonio De Vitis ◽  
Antonio Di Tecco ◽  
Pierfrancesco Foglia ◽  
Cosimo Antonio Prete

During the production of pharmaceutical glass tubes, a machine-vision based inspection system can be utilized to perform the high-quality check required by the process. The necessity to improve detection accuracy, and increase production speed determines the need for fast solutions for defects detection. Solutions proposed in literature cannot be efficiently exploited due to specific factors that characterize the production process. In this work, we have derived an algorithm that does not change the detection quality compared to state-of-the-art proposals, but does determine a drastic reduction in the processing time. The algorithm utilizes an adaptive threshold based on the Sigma Rule to detect blobs, and applies a threshold to the variation of luminous intensity along a row to detect air lines. These solutions limit the detection effects due to the tube’s curvature, and rotation and vibration of the tube, which characterize glass tube production. The algorithm has been compared with state-of-the-art solutions. The results demonstrate that, with the algorithm proposed, the processing time of the detection phase is reduced by 86%, with an increase in throughput of 268%, achieving greater accuracy in detection. Performance is further improved by adopting Region of Interest reduction techniques. Moreover, we have developed a tuning procedure to determine the algorithm’s parameters in the production batch change. We assessed the performance of the algorithm in a real environment using the “certification” functionality of the machine. Furthermore, we observed that out of 1000 discarded tubes, nine should not have been discarded and a further seven should have been discarded.


Author(s):  
Qijie Zhao ◽  
Tao Sheng ◽  
Yongtao Wang ◽  
Zhi Tang ◽  
Ying Chen ◽  
...  

Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask RCNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multiscale, pyramidal architecture of the backbones which are originally designed for object classification task. Newly, in this work, we present Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each Ushape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to construct a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one-stage detectors. Specifically, on MSCOCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which are the new stateof-the-art results among one-stage detectors. The code will be made available on https://github.com/qijiezhao/M2Det.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Bin Chen ◽  
Jili Yan ◽  
Ke Wang

The accuracy of Fresh Tea Sprouts Detection (FTSD) is not high enough, which has become a big bottleneck in the field of vision-based automatic tea picking technology. In order to improve the detection performance, we rethink the process of FTSD. Meanwhile, motivated by the multispectral image processing, we find that more input information can lead to a better detection result. With this in mind, a novel Fresh Tea Sprouts Detection method via Image Enhancement and Fusion Single-Shot Detector (FTSD-IEFSSD) is proposed in this paper. Firstly, we obtain an enhanced image via RGB-channel-transform-based image enhancement algorithm, which uses the original fresh tea sprouts color image as the input. The enhanced image can provide more input information, where the contrast in the fresh tea sprouts area is increased and the background area is decreased. Then, the enhanced image and color image is used in the detection subnetwork with the backbone of ResNet50 separately. We also use the multilayer semantic fusion and scores fusion to further improve the detection accuracy. The strategy of tea sprouts shape-based default boxes is also included during the training. The experimental results show that the proposed method has a better performance on FTSD than the state-of-the-art methods.


2019 ◽  
Vol 11 (18) ◽  
pp. 2095 ◽  
Author(s):  
Kun Fu ◽  
Zhuo Chen ◽  
Yue Zhang ◽  
Xian Sun

In recent years, deep learning has led to a remarkable breakthrough in object detection in remote sensing images. In practice, two-stage detectors perform well regarding detection accuracy but are slow. On the other hand, one-stage detectors integrate the detection pipeline of two-stage detectors to simplify the detection process, and are faster, but with lower detection accuracy. Enhancing the capability of feature representation may be a way to improve the detection accuracy of one-stage detectors. For this goal, this paper proposes a novel one-stage detector with enhanced capability of feature representation. The enhanced capability benefits from two proposed structures: dual top-down module and dense-connected inception module. The former efficiently utilizes multi-scale features from multiple layers of the backbone network. The latter both widens and deepens the network to enhance the ability of feature representation with limited extra computational cost. To evaluate the effectiveness of proposed structures, we conducted experiments on horizontal bounding box detection tasks on the challenging DOTA dataset and gained 73.49% mean Average Precision (mAP), achieving state-of-the-art performance. Furthermore, our method ran significantly faster than the best public two-stage detector on the DOTA dataset.


2020 ◽  
Vol 2020 (4) ◽  
pp. 76-1-76-7
Author(s):  
Swaroop Shankar Prasad ◽  
Ofer Hadar ◽  
Ilia Polian

Image steganography can have legitimate uses, for example, augmenting an image with a watermark for copyright reasons, but can also be utilized for malicious purposes. We investigate the detection of malicious steganography using neural networkbased classification when images are transmitted through a noisy channel. Noise makes detection harder because the classifier must not only detect perturbations in the image but also decide whether they are due to the malicious steganographic modifications or due to natural noise. Our results show that reliable detection is possible even for state-of-the-art steganographic algorithms that insert stego bits not affecting an image’s visual quality. The detection accuracy is high (above 85%) if the payload, or the amount of the steganographic content in an image, exceeds a certain threshold. At the same time, noise critically affects the steganographic information being transmitted, both through desynchronization (destruction of information which bits of the image contain steganographic information) and by flipping these bits themselves. This will force the adversary to use a redundant encoding with a substantial number of error-correction bits for reliable transmission, making detection feasible even for small payloads.


2020 ◽  
Vol 17 (3) ◽  
pp. 172988142092566
Author(s):  
Dahan Wang ◽  
Sheng Luo ◽  
Li Zhao ◽  
Xiaoming Pan ◽  
Muchou Wang ◽  
...  

Fire is a fierce disaster, and smoke is the early signal of fire. Since such features as chrominance, texture, and shape of smoke are very special, a lot of methods based on these features have been developed. But these static characteristics vary widely, so there are some exceptions leading to low detection accuracy. On the other side, the motion of smoke is much more discriminating than the aforementioned features, so a time-domain neural network is proposed to extract its dynamic characteristics. This smoke recognition network has these advantages:(1) extract the spatiotemporal with the 3D filters which work on dynamic and static characteristics synchronously; (2) high accuracy, 87.31% samples being classified rightly, which is the state of the art even in a chaotic environments, and the fuzzy objects for other methods, such as haze, fog, and climbing cars, are distinguished distinctly; (3) high sensitiveness, smoke being detected averagely at the 23rd frame, which is also the state of the art, which is meaningful to alarm early fire as soon as possible; and (4) it is not been based on any hypothesis, which guarantee the method compatible. Finally, a new metric, the difference between the first frame in which smoke is detected and the first frame in which smoke happens, is proposed to compare the algorithms sensitivity in videos. The experiments confirm that the dynamic characteristics are more discriminating than the aforementioned static characteristics, and smoke recognition network is a good tool to extract compound feature.


2021 ◽  
Vol 11 (9) ◽  
pp. 4232
Author(s):  
Krishan Harkhoe ◽  
Guy Verschaffelt ◽  
Guy Van der Sande

Delay-based reservoir computing (RC), a neuromorphic computing technique, has gathered lots of interest, as it promises compact and high-speed RC implementations. To further boost the computing speeds, we introduce and study an RC setup based on spin-VCSELs, thereby exploiting the high polarization modulation speed inherent to these lasers. Based on numerical simulations, we benchmarked this setup against state-of-the-art delay-based RC systems and its parameter space was analyzed for optimal performance. The high modulation speed enabled us to have more virtual nodes in a shorter time interval. However, we found that at these short time scales, the delay time and feedback rate heavily influence the nonlinear dynamics. Therefore, and contrary to other laser-based RC systems, the delay time has to be optimized in order to obtain good RC performances. We achieved state-of-the-art performances on a benchmark timeseries prediction task. This spin-VCSEL-based RC system shows a ten-fold improvement in processing speed, which can further be enhanced in a straightforward way by increasing the birefringence of the VCSEL chip.


2021 ◽  
Vol 13 (7) ◽  
pp. 1243
Author(s):  
Wenxin Yin ◽  
Wenhui Diao ◽  
Peijin Wang ◽  
Xin Gao ◽  
Ya Li ◽  
...  

The detection of Thermal Power Plants (TPPs) is a meaningful task for remote sensing image interpretation. It is a challenging task, because as facility objects TPPs are composed of various distinctive and irregular components. In this paper, we propose a novel end-to-end detection framework for TPPs based on deep convolutional neural networks. Specifically, based on the RetinaNet one-stage detector, a context attention multi-scale feature extraction network is proposed to fuse global spatial attention to strengthen the ability in representing irregular objects. In addition, we design a part-based attention module to adapt to TPPs containing distinctive components. Experiments show that the proposed method outperforms the state-of-the-art methods and can achieve 68.15% mean average precision.


2021 ◽  
Vol 11 (11) ◽  
pp. 4894
Author(s):  
Anna Scius-Bertrand ◽  
Michael Jungo ◽  
Beat Wolf ◽  
Andreas Fischer ◽  
Marc Bui

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.


Sign in / Sign up

Export Citation Format

Share Document