AVILNet: A New Pliable Network with a Novel Metric for Small-Object Segmentation and Detection in Infrared Images

2021 ◽  
Vol 13 (4) ◽  
pp. 555
Author(s):  
Ikhwan Song ◽  
Sungho Kim

Infrared small-object segmentation (ISOS) has a persistent trade-off problem—that is, which came first, recall or precision? Constructing a fine balance between of them is, au fond, of vital importance to obtain the best performance in real applications, such as surveillance, tracking, and many fields related to infrared searching and tracking. F1-score may be a good evaluation metric for this problem. However, since the F1-score only depends upon a specific threshold value, it cannot reflect the user’s requirements according to the various application environment. Therefore, several metrics are commonly used together. Now we introduce F-area, a novel metric for a panoptic evaluation of average precision and F1-score. It can simultaneously consider the performance in terms of real application and the potential capability of a model. Furthermore, we propose a new network, called the Amorphous Variable Inter-located Network (AVILNet), which is of pliable structure based on GridNet, and it is also an ensemble network consisting of the main and its sub-network. Compared with the state-of-the-art ISOS methods, our model achieved an AP of 51.69%, F1-score of 63.03%, and F-area of 32.58% on the International Conference on Computer Vision 2019 ISOS Single dataset by using one generator. In addition, an AP of 53.6%, an F1-score of 60.99%, and F-area of 32.69% by using dual generators, with beating the existing best record (AP, 51.42%; F1-score, 57.04%; and F-area, 29.33%).

2021 ◽  
Vol 11 (9) ◽  
pp. 4248
Author(s):  
Hong Hai Hoang ◽  
Bao Long Tran

With the rapid development of cameras and deep learning technologies, computer vision tasks such as object detection, object segmentation and object tracking are being widely applied in many fields of life. For robot grasping tasks, object segmentation aims to classify and localize objects, which helps robots to be able to pick objects accurately. The state-of-the-art instance segmentation network framework, Mask Region-Convolution Neural Network (Mask R-CNN), does not always perform an excellent accurate segmentation at the edge or border of objects. The approach using 3D camera, however, is able to extract the entire (foreground) objects easily but can be difficult or require a large amount of computation effort to classify it. We propose a novel approach, in which we combine Mask R-CNN with 3D algorithms by adding a 3D process branch for instance segmentation. Both outcomes of two branches are contemporaneously used to classify the pixels at the edge objects by dealing with the spatial relationship between edge region and mask region. We analyze the effectiveness of the method by testing with harsh cases of object positions, for example, objects are closed, overlapped or obscured by each other to focus on edge and border segmentation. Our proposed method is about 4 to 7% higher and more stable in IoU (intersection of union). This leads to a reach of 46% of mAP (mean Average Precision), which is a higher accuracy than its counterpart. The feasibility experiment shows that our method could be a remarkable promoting for the research of the grasping robot.


Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2075
Author(s):  
Hao Chen ◽  
Hong Zheng

Anchor-based detectors are widely adopted in object detection. To improve the accuracy of object detection, multiple anchor boxes are intensively placed on the input image, yet most of them are invalid. Although anchor-free methods can reduce the number of useless anchor boxes, the invalid ones still occupy a high proportion. On this basis, this paper proposes an object-detection method based on center point proposals to reduce the number of useless anchor boxes while improving the quality of anchor boxes, balancing the proportion of positive and negative samples. By introducing the differentiation module in the shallow layer, the new method can alleviate the problem of missing detection caused by overlapping of center points. When trained and tested on COCO (Common Objects in Context) dataset, this algorithm records an increase of about 2% in APS (Average Precision of Small Object), reaching 27.8%. The detector designed in this study outperforms most of the state-of-the-art real-time detectors in speed and accuracy trade-off, achieving the AP of 43.2 in 137 ms.


2021 ◽  
Vol 20 (3) ◽  
pp. 1-25
Author(s):  
Elham Shamsa ◽  
Alma Pröbstl ◽  
Nima TaheriNejad ◽  
Anil Kanduri ◽  
Samarjit Chakraborty ◽  
...  

Smartphone users require high Battery Cycle Life (BCL) and high Quality of Experience (QoE) during their usage. These two objectives can be conflicting based on the user preference at run-time. Finding the best trade-off between QoE and BCL requires an intelligent resource management approach that considers and learns user preference at run-time. Current approaches focus on one of these two objectives and neglect the other, limiting their efficiency in meeting users’ needs. In this article, we present UBAR, User- and Battery-aware Resource management, which considers dynamic workload, user preference, and user plug-in/out pattern at run-time to provide a suitable trade-off between BCL and QoE. UBAR personalizes this trade-off by learning the user’s habits and using that to satisfy QoE, while considering battery temperature and State of Charge (SOC) pattern to maximize BCL. The evaluation results show that UBAR achieves 10% to 40% improvement compared to the existing state-of-the-art approaches.


Author(s):  
Alexandru-Lucian Georgescu ◽  
Alessandro Pappalardo ◽  
Horia Cucu ◽  
Michaela Blott

AbstractThe last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial applications. However, few of the existing ASR technologies are suitable for integration in embedded applications, due to their hard constrains related to computing power and memory usage. This overview paper serves as a guided tour through the recent literature on speech recognition and compares the most popular ASR implementations. The comparison emphasizes the trade-off between ASR performance and hardware requirements, to further serve decision makers in choosing the system which fits best their embedded application. To the best of our knowledge, this is the first study to provide this kind of trade-off analysis for state-of-the-art ASR systems.


2021 ◽  
Vol 13 (7) ◽  
pp. 1243
Author(s):  
Wenxin Yin ◽  
Wenhui Diao ◽  
Peijin Wang ◽  
Xin Gao ◽  
Ya Li ◽  
...  

The detection of Thermal Power Plants (TPPs) is a meaningful task for remote sensing image interpretation. It is a challenging task, because as facility objects TPPs are composed of various distinctive and irregular components. In this paper, we propose a novel end-to-end detection framework for TPPs based on deep convolutional neural networks. Specifically, based on the RetinaNet one-stage detector, a context attention multi-scale feature extraction network is proposed to fuse global spatial attention to strengthen the ability in representing irregular objects. In addition, we design a part-based attention module to adapt to TPPs containing distinctive components. Experiments show that the proposed method outperforms the state-of-the-art methods and can achieve 68.15% mean average precision.


2020 ◽  
Vol 15 (1) ◽  
pp. 4-17
Author(s):  
Jean-François Biasse ◽  
Xavier Bonnetain ◽  
Benjamin Pring ◽  
André Schrottenloher ◽  
William Youmans

AbstractWe propose a heuristic algorithm to solve the underlying hard problem of the CSIDH cryptosystem (and other isogeny-based cryptosystems using elliptic curves with endomorphism ring isomorphic to an imaginary quadratic order 𝒪). Let Δ = Disc(𝒪) (in CSIDH, Δ = −4p for p the security parameter). Let 0 < α < 1/2, our algorithm requires:A classical circuit of size $2^{\tilde{O}\left(\log(|\Delta|)^{1-\alpha}\right)}.$A quantum circuit of size $2^{\tilde{O}\left(\log(|\Delta|)^{\alpha}\right)}.$Polynomial classical and quantum memory.Essentially, we propose to reduce the size of the quantum circuit below the state-of-the-art complexity $2^{\tilde{O}\left(\log(|\Delta|)^{1/2}\right)}$ at the cost of increasing the classical circuit-size required. The required classical circuit remains subexponential, which is a superpolynomial improvement over the classical state-of-the-art exponential solutions to these problems. Our method requires polynomial memory, both classical and quantum.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fetulhak Abdurahman ◽  
Kinde Anlay Fante ◽  
Mohammed Aliy

Abstract Background Manual microscopic examination of Leishman/Giemsa stained thin and thick blood smear is still the “gold standard” for malaria diagnosis. One of the drawbacks of this method is that its accuracy, consistency, and diagnosis speed depend on microscopists’ diagnostic and technical skills. It is difficult to get highly skilled microscopists in remote areas of developing countries. To alleviate this problem, in this paper, we propose to investigate state-of-the-art one-stage and two-stage object detection algorithms for automated malaria parasite screening from microscopic image of thick blood slides. Results YOLOV3 and YOLOV4 models, which are state-of-the-art object detectors in accuracy and speed, are not optimized for detecting small objects such as malaria parasites in microscopic images. We modify these models by increasing feature scale and adding more detection layers to enhance their capability of detecting small objects without notably decreasing detection speed. We propose one modified YOLOV4 model, called YOLOV4-MOD and two modified models of YOLOV3, which are called YOLOV3-MOD1 and YOLOV3-MOD2. Besides, new anchor box sizes are generated using K-means clustering algorithm to exploit the potential of these models in small object detection. The performance of the modified YOLOV3 and YOLOV4 models were evaluated on a publicly available malaria dataset. These models have achieved state-of-the-art accuracy by exceeding performance of their original versions, Faster R-CNN, and SSD in terms of mean average precision (mAP), recall, precision, F1 score, and average IOU. YOLOV4-MOD has achieved the best detection accuracy among all the other models with a mAP of 96.32%. YOLOV3-MOD2 and YOLOV3-MOD1 have achieved mAP of 96.14% and 95.46%, respectively. Conclusions The experimental results of this study demonstrate that performance of modified YOLOV3 and YOLOV4 models are highly promising for detecting malaria parasites from images captured by a smartphone camera over the microscope eyepiece. The proposed system is suitable for deployment in low-resource setting areas.


Algorithms ◽  
2019 ◽  
Vol 12 (5) ◽  
pp. 99 ◽  
Author(s):  
Kleopatra Pirpinia ◽  
Peter A. N. Bosman ◽  
Jan-Jakob Sonke ◽  
Marcel van Herk ◽  
Tanja Alderliesten

Current state-of-the-art medical deformable image registration (DIR) methods optimize a weighted sum of key objectives of interest. Having a pre-determined weight combination that leads to high-quality results for any instance of a specific DIR problem (i.e., a class solution) would facilitate clinical application of DIR. However, such a combination can vary widely for each instance and is currently often manually determined. A multi-objective optimization approach for DIR removes the need for manual tuning, providing a set of high-quality trade-off solutions. Here, we investigate machine learning for a multi-objective class solution, i.e., not a single weight combination, but a set thereof, that, when used on any instance of a specific DIR problem, approximates such a set of trade-off solutions. To this end, we employed a multi-objective evolutionary algorithm to learn sets of weight combinations for three breast DIR problems of increasing difficulty: 10 prone-prone cases, 4 prone-supine cases with limited deformations and 6 prone-supine cases with larger deformations and image artefacts. Clinically-acceptable results were obtained for the first two problems. Therefore, for DIR problems with limited deformations, a multi-objective class solution can be machine learned and used to compute straightforwardly multiple high-quality DIR outcomes, potentially leading to more efficient use of DIR in clinical practice.


2017 ◽  
Vol 44 (4) ◽  
pp. 464-490 ◽  
Author(s):  
Luis Omar Colombo-Mendoza ◽  
Rafael Valencia-García ◽  
Alejandro Rodríguez-González ◽  
Ricardo Colomo-Palacios ◽  
Giner Alor-Hernández

In this article, we propose (1) a knowledge-based probabilistic collaborative filtering (CF) recommendation approach using both an ontology-based semantic similarity metric and a latent Dirichlet allocation (LDA) model-based recommendation technique and (2) a context-aware software architecture and system with the objective of validating the recommendation approach in the eating domain (foodservice places). The ontology on which the similarity metric is based is additionally leveraged to model and reason about users’ contexts; the proposed LDA model also guides the users’ context modelling to some extent. An evaluation method in the form of a comparative analysis based on traditional information retrieval (IR) metrics and a reference ranking-based evaluation metric (correctly ranked places) is presented towards the end of this article to reliably assess the efficacy and effectiveness of our recommendation approach, along with its utility from the user’s perspective. Our recommendation approach achieves higher average precision and recall values (8% and 7.40%, respectively) in the best-case scenario when compared with a CF approach that employs a baseline similarity metric. In addition, when compared with a partial implementation that does not consider users’ preferences for topics, the comprehensive implementation of our recommendation approach achieves higher average values of correctly ranked places (2.5 of 5 versus 1.5 of 5).


Sign in / Sign up

Export Citation Format

Share Document