scholarly journals Pano-RSOD: A Dataset and Benchmark for Panoramic Road Scene Object Detection

Electronics ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 329 ◽  
Author(s):  
Yong Li ◽  
Guofeng Tong ◽  
Huashuai Gao ◽  
Yuebin Wang ◽  
Liqiang Zhang ◽  
...  

Panoramic images have a wide range of applications in many fields with their ability to perceive all-round information. Object detection based on panoramic images has certain advantages in terms of environment perception due to the characteristics of panoramic images, e.g., lager perspective. In recent years, deep learning methods have achieved remarkable results in image classification and object detection. Their performance depends on the large amount of training data. Therefore, a good training dataset is a prerequisite for the methods to achieve better recognition results. Then, we construct a benchmark named Pano-RSOD for panoramic road scene object detection. Pano-RSOD contains vehicles, pedestrians, traffic signs and guiding arrows. The objects of Pano-RSOD are labelled by bounding boxes in the images. Different from traditional object detection datasets, Pano-RSOD contains more objects in a panoramic image, and the high-resolution images have 360-degree environmental perception, more annotations, more small objects and diverse road scenes. The state-of-the-art deep learning algorithms are trained on Pano-RSOD for object detection, which demonstrates that Pano-RSOD is a useful benchmark, and it provides a better panoramic image training dataset for object detection tasks, especially for small and deformed objects.

2021 ◽  
Author(s):  
Prashanth Pillai ◽  
Purnaprajna Mangsuli

Abstract In the O&G (Oil & Gas) industry, unstructured data sources such as technical reports on hydrocarbon production, daily drilling, well construction, etc. contain valuable information. This information however is conveyed through various formats such as tables, forms, text, figures, etc. Detecting these different entities in documents is essential for building a structured representation of the information within and for automated processing of documents at scale. Our work presents a document layout analysis workflow to detect/localize different entities based on a deep learning-based framework. The workflow comprises of a deep learning-based object-detection framework based on transformers to identify the spatial location of entities in a document page. The key elements of the object-detection pipeline include a residual network backbone for feature extraction and an encoder-decoder transformer based on the latest detection transformers (DETR) to predict object-bounding boxes and category labels. The object detection is formulated as a direct set prediction task using bipartite matching while also eliminating conventional operations like anchor box generation and non-maximal suppression. The availability of sufficient publicly available document layout data sets that incorporate the artifacts observed in historical O&G technical reports is often a major challenge. We attempt to address this challenge by using a novel training data augmentation methodology. The dense occurrence of elements in a page can often introduce uncertainties resulting in bounding boxes cutting through text content. We adopt a bounding box post-processing methodology to refine the bounding box coordinates to minimize undercuts. The proposed document layout analysis pipeline was trained to detect entity types such as headings, text blocks, tables, forms, and images/charts in a document page. A wide range of pages from lithology, stratigraphy, drilling, and field development reports were used for model training. The reports also included a considerable number of historical scanned reports. The trained object-detection model was evaluated on a test data set prepared from the O&G reports. DETR demonstrated superior performance when compared with the Mask R-CNN on our dataset.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2611
Author(s):  
Andrew Shepley ◽  
Greg Falzon ◽  
Christopher Lawson ◽  
Paul Meek ◽  
Paul Kwan

Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Xinyang Li ◽  
Guoxun Zhang ◽  
Hui Qiao ◽  
Feng Bao ◽  
Yue Deng ◽  
...  

AbstractThe development of deep learning and open access to a substantial collection of imaging data together provide a potential solution for computational image transformation, which is gradually changing the landscape of optical imaging and biomedical research. However, current implementations of deep learning usually operate in a supervised manner, and their reliance on laborious and error-prone data annotation procedures remains a barrier to more general applicability. Here, we propose an unsupervised image transformation to facilitate the utilization of deep learning for optical microscopy, even in some cases in which supervised models cannot be applied. Through the introduction of a saliency constraint, the unsupervised model, named Unsupervised content-preserving Transformation for Optical Microscopy (UTOM), can learn the mapping between two image domains without requiring paired training data while avoiding distortions of the image content. UTOM shows promising performance in a wide range of biomedical image transformation tasks, including in silico histological staining, fluorescence image restoration, and virtual fluorescence labeling. Quantitative evaluations reveal that UTOM achieves stable and high-fidelity image transformations across different imaging conditions and modalities. We anticipate that our framework will encourage a paradigm shift in training neural networks and enable more applications of artificial intelligence in biomedical imaging.


2020 ◽  
Vol 10 (6) ◽  
pp. 2104
Author(s):  
Michał Tomaszewski ◽  
Paweł Michalski ◽  
Jakub Osuchowski

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.


2021 ◽  
Vol 23 (06) ◽  
pp. 47-57
Author(s):  
Aditya Kulkarni ◽  
◽  
Manali Munot ◽  
Sai Salunkhe ◽  
Shubham Mhaske ◽  
...  

With the development in technologies right from serial to parallel computing, GPU, AI, and deep learning models a series of tools to process complex images have been developed. The main focus of this research is to compare various algorithms(pre-trained models) and their contributions to process complex images in terms of performance, accuracy, time, and their limitations. The pre-trained models we are using are CNN, R-CNN, R-FCN, and YOLO. These models are python language-based and use libraries like TensorFlow, OpenCV, and free image databases (Microsoft COCO and PAS-CAL VOC 2007/2012). These not only aim at object detection but also on building bounding boxes around appropriate locations. Thus, by this review, we get a better vision of these models and their performance and a good idea of which models are ideal for various situations.


2021 ◽  
Author(s):  
Sung Hyun Noh ◽  
Chansik An ◽  
Dain Kim ◽  
Seung Hyun Lee ◽  
Min-Yung Chang ◽  
...  

Abstract Background A computer algorithm that automatically detects sacroiliac joint abnormalities on plain radiograph would help radiologists avoid missing sacroiliitis. This study aimed to develop and validate a deep learning model to detect and diagnose sacroiliitis on plain radiograph in young patients with low back pain. Methods This Institutional Review Board-approved retrospective study included 478 and 468 plain radiographs from 241 and 433 young (< 40 years) patients who complained of low back pain with and without ankylosing spondylitis, respectively. They were randomly split into training and test datasets with a ratio of 8:2. Radiologists reviewed the images and labeled the coordinates of a bounding box and determined the presence or absence of sacroiliitis for each sacroiliac joint. We fine-tined and optimized the EfficientDet-D4 object detection model pre-trained on the COCO 2107 dataset on the training dataset and validated the final model on the test dataset. Results The mean average precision, an evaluation metric for object detection accuracy, was 0.918 at 0.5 intersection over union. In the diagnosis of sacroiliitis, the area under the curve, sensitivity, specificity, accuracy, and F1-score were 0.932 (95% confidence interval, 0.903–0.961), 96.9% (92.9–99.0), 86.8% (81.5–90.9), 91.1% (87.7–93.7), and 90.2% (85.0–93.9), respectively. Conclusions The EfficientDet, a deep learning-based object detection algorithm, could be used to automatically diagnose sacroiliitis on plain radiograph.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e15012-e15012
Author(s):  
Mayur Sarangdhar ◽  
Venkatesh Kolli ◽  
William Seibel ◽  
John Peter Perentesis

e15012 Background: Recent advances in cancer treatment have revolutionized patient outcomes. However, toxicities associated with anti-cancer drugs remain a concern with many anti-cancer drugs now implicated in cardiotoxicity. The complete spectrum of cardiotoxicity associated with anti-cancer drugs is only evident post-approval of drugs. Deep Learning methods can identify novel and emerging safety signals in “real-world” clinical settings. Methods: We used AERS Mine, an open-source data mining platform to identify drug toxicity signatures in the FDA’s Adverse Event Reporting System of 16 million patients. We identified 1.3 million patients on traditional and targeted anti-cancer therapy to analyze therapy-specific cardiotoxicity patterns. Cardiotoxicity training dataset contained 1571 molecules characterized with bioassay against hERG potassium channel and included 350 toxic compounds with an IC50 of < 1μM. We implemented a Deep Belief Network to extract a deep hierarchical representation of the training data, and the Extra Tree Classifier to predict the toxicity of drug candidates. Drugs were encoded using 1024-bit Morgan fingerprint representation using SMILES with search radius of 7 atoms. Pharmacovigilance metrics (Relative Risks and safety signals) were used to establish statistical correlation. Results: This analysis identified signatures of arrhythmias and conduction abnormalities associated with common anti-cancer drugs (e.g. atrial fibrillation with ibrutinib, alkylating agents, immunomodulatory drugs; sinus bradycardia with 5FU, paclitaxel, thalidomide; sinus tachycardia with anthracyclines). Our analysis also identified myositis/myocarditis association with newer immune checkpoint inhibitors (e.g., atezolizumab, durvalumab, cemiplimab, avelumab) paralleling earlier signals for pembrolizumab, nivolumab, and ipilimumab. Deep Learning identified signatures of chemical moieties linked to cardiotoxicity, including common motifs in drugs associated with arrhythmias and conduction abnormalities with an accuracy of 89%. Conclusions: Deep Learning provides a comprehensive insight into emerging cardiotoxicity patterns of approved and investigational drugs, allows detection of ‘rogue’ chemical moieties, and shows promise for novel drug discovery and development.


Electronics ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 583 ◽  
Author(s):  
Khang Nguyen ◽  
Nhut T. Huynh ◽  
Phat C. Nguyen ◽  
Khanh-Duy Nguyen ◽  
Nguyen D. Vo ◽  
...  

Unmanned aircraft systems or drones enable us to record or capture many scenes from the bird’s-eye view and they have been fast deployed to a wide range of practical domains, i.e., agriculture, aerial photography, fast delivery and surveillance. Object detection task is one of the core steps in understanding videos collected from the drones. However, this task is very challenging due to the unconstrained viewpoints and low resolution of captured videos. While deep-learning modern object detectors have recently achieved great success in general benchmarks, i.e., PASCAL-VOC and MS-COCO, the robustness of these detectors on aerial images captured by drones is not well studied. In this paper, we present an evaluation of state-of-the-art deep-learning detectors including Faster R-CNN (Faster Regional CNN), RFCN (Region-based Fully Convolutional Networks), SNIPER (Scale Normalization for Image Pyramids with Efficient Resampling), Single-Shot Detector (SSD), YOLO (You Only Look Once), RetinaNet, and CenterNet for the object detection in videos captured by drones. We conduct experiments on VisDrone2019 dataset which contains 96 videos with 39,988 annotated frames and provide insights into efficient object detectors for aerial images.


Author(s):  
M. Buyukdemircioglu ◽  
R. Can ◽  
S. Kocaman

Abstract. Automatic detection, segmentation and reconstruction of buildings in urban areas from Earth Observation (EO) data are still challenging for many researchers. Roof is one of the most important element in a building model. The three-dimensional geographical information system (3D GIS) applications generally require the roof type and roof geometry for performing various analyses on the models, such as energy efficiency. The conventional segmentation and classification methods are often based on features like corners, edges and line segments. In parallel to the developments in computer hardware and artificial intelligence (AI) methods including deep learning (DL), image features can be extracted automatically. As a DL technique, convolutional neural networks (CNNs) can also be used for image classification tasks, but require large amount of high quality training data for obtaining accurate results. The main aim of this study was to generate a roof type dataset from very high-resolution (10 cm) orthophotos of Cesme, Turkey, and to classify the roof types using a shallow CNN architecture. The training dataset consists 10,000 roof images and their labels. Six roof type classes such as flat, hip, half-hip, gable, pyramid and complex roofs were used for the classification in the study area. The prediction performance of the shallow CNN model used here was compared with the results obtained from the fine-tuning of three well-known pre-trained networks, i.e. VGG-16, EfficientNetB4, ResNet-50. The results show that although our CNN has slightly lower performance expressed with the overall accuracy, it is still acceptable for many applications using sparse data.


2020 ◽  
Author(s):  
Haiming Tang ◽  
Nanfei Sun ◽  
Steven Shen

Artificial intelligence (AI) has an emerging progress in diagnostic pathology. A large number of studies of applying deep learning models to histopathological images have been published in recent years. While many studies claim high accuracies, they may fall into the pitfalls of overfitting and lack of generalization due to the high variability of the histopathological images. We use the example of Osteosarcoma to illustrate the pitfalls and how the addition of model input variability can help improve model performance. We use the publicly available osteosarcoma dataset to retrain a previously published classification model for osteosarcoma. We partition the same set of images into the training and testing datasets differently than the original study: the test dataset consists of images from one patient while the training dataset consists images of all other patients. The performance of the model on the test set using the new partition schema declines dramatically, indicating a lack of model generalization and overfitting.We also show the influence of training data variability on model performance by collecting a minimal dataset of 10 osteosarcoma subtypes as well as benign tissues and benign bone tumors of differentiation. We show the additions of more and more subtypes into the training data step by step under the same model schema yield a series of coherent models with increasing performances. In conclusion, we bring forward data preprocessing and collection tactics for histopathological images of high variability to avoid the pitfalls of overfitting and build deep learning models of higher generalization abilities.


Sign in / Sign up

Export Citation Format

Share Document