Pano-RSOD: A Dataset and Benchmark for Panoramic Road Scene Object Detection

Yong Li; Guofeng Tong; Huashuai Gao; Yuebin Wang; Liqiang Zhang; Huairong Chen

doi:10.3390/electronics8030329

Pano-RSOD: A Dataset and Benchmark for Panoramic Road Scene Object Detection

Electronics ◽

10.3390/electronics8030329 ◽

2019 ◽

Vol 8 (3) ◽

pp. 329 ◽

Cited By ~ 2

Author(s):

Yong Li ◽

Guofeng Tong ◽

Huashuai Gao ◽

Yuebin Wang ◽

Liqiang Zhang ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Training Data ◽

Training Dataset ◽

Panoramic Image ◽

Information Object ◽

Wide Range ◽

Panoramic Images ◽

Bounding Boxes ◽

Scene Object

Panoramic images have a wide range of applications in many fields with their ability to perceive all-round information. Object detection based on panoramic images has certain advantages in terms of environment perception due to the characteristics of panoramic images, e.g., lager perspective. In recent years, deep learning methods have achieved remarkable results in image classification and object detection. Their performance depends on the large amount of training data. Therefore, a good training dataset is a prerequisite for the methods to achieve better recognition results. Then, we construct a benchmark named Pano-RSOD for panoramic road scene object detection. Pano-RSOD contains vehicles, pedestrians, traffic signs and guiding arrows. The objects of Pano-RSOD are labelled by bounding boxes in the images. Different from traditional object detection datasets, Pano-RSOD contains more objects in a panoramic image, and the high-resolution images have 360-degree environmental perception, more annotations, more small objects and diverse road scenes. The state-of-the-art deep learning algorithms are trained on Pano-RSOD for object detection, which demonstrates that Pano-RSOD is a useful benchmark, and it provides a better panoramic image training dataset for object detection tasks, especially for small and deformed objects.

Download Full-text

Document Layout Analysis Using Detection Transformers

10.2118/207266-ms ◽

2021 ◽

Author(s):

Prashanth Pillai ◽

Purnaprajna Mangsuli

Keyword(s):

Deep Learning ◽

Object Detection ◽

Superior Performance ◽

Layout Analysis ◽

Bounding Box ◽

Document Layout Analysis ◽

Wide Range ◽

Document Layout ◽

Bounding Boxes ◽

Document Page

Abstract In the O&G (Oil & Gas) industry, unstructured data sources such as technical reports on hydrocarbon production, daily drilling, well construction, etc. contain valuable information. This information however is conveyed through various formats such as tables, forms, text, figures, etc. Detecting these different entities in documents is essential for building a structured representation of the information within and for automated processing of documents at scale. Our work presents a document layout analysis workflow to detect/localize different entities based on a deep learning-based framework. The workflow comprises of a deep learning-based object-detection framework based on transformers to identify the spatial location of entities in a document page. The key elements of the object-detection pipeline include a residual network backbone for feature extraction and an encoder-decoder transformer based on the latest detection transformers (DETR) to predict object-bounding boxes and category labels. The object detection is formulated as a direct set prediction task using bipartite matching while also eliminating conventional operations like anchor box generation and non-maximal suppression. The availability of sufficient publicly available document layout data sets that incorporate the artifacts observed in historical O&G technical reports is often a major challenge. We attempt to address this challenge by using a novel training data augmentation methodology. The dense occurrence of elements in a page can often introduce uncertainties resulting in bounding boxes cutting through text content. We adopt a bounding box post-processing methodology to refine the bounding box coordinates to minimize undercuts. The proposed document layout analysis pipeline was trained to detect entity types such as headings, text blocks, tables, forms, and images/charts in a document page. A wide range of pages from lithology, stratigraphy, drilling, and field development reports were used for model training. The reports also included a considerable number of historical scanned reports. The trained object-detection model was evaluated on a test data set prepared from the O&G reports. DETR demonstrated superior performance when compared with the Mask R-CNN on our dataset.

Download Full-text

U-Infuse: Democratization of Customizable Deep Learning for Object Detection

Sensors ◽

10.3390/s21082611 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2611

Author(s):

Andrew Shepley ◽

Greg Falzon ◽

Christopher Lawson ◽

Paul Meek ◽

Paul Kwan

Keyword(s):

Deep Learning ◽

Intellectual Property ◽

Object Detection ◽

Image Data ◽

Learning Technologies ◽

Training Data ◽

Learning Models ◽

Ecological Data ◽

Single Class ◽

Large Numbers

Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.

Download Full-text

Unsupervised content-preserving transformation for optical microscopy

Light Science & Applications ◽

10.1038/s41377-021-00484-y ◽

2021 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Xinyang Li ◽

Guoxun Zhang ◽

Hui Qiao ◽

Feng Bao ◽

Yue Deng ◽

...

Keyword(s):

Deep Learning ◽

Optical Microscopy ◽

Training Data ◽

Fluorescence Labeling ◽

Imaging Data ◽

Image Transformation ◽

General Applicability ◽

Data Annotation ◽

Biomedical Image ◽

Wide Range

AbstractThe development of deep learning and open access to a substantial collection of imaging data together provide a potential solution for computational image transformation, which is gradually changing the landscape of optical imaging and biomedical research. However, current implementations of deep learning usually operate in a supervised manner, and their reliance on laborious and error-prone data annotation procedures remains a barrier to more general applicability. Here, we propose an unsupervised image transformation to facilitate the utilization of deep learning for optical microscopy, even in some cases in which supervised models cannot be applied. Through the introduction of a saliency constraint, the unsupervised model, named Unsupervised content-preserving Transformation for Optical Microscopy (UTOM), can learn the mapping between two image domains without requiring paired training data while avoiding distortions of the image content. UTOM shows promising performance in a wide range of biomedical image transformation tasks, including in silico histological staining, fluorescence image restoration, and virtual fluorescence labeling. Quantitative evaluations reveal that UTOM achieves stable and high-fidelity image transformations across different imaging conditions and modalities. We anticipate that our framework will encourage a paradigm shift in training neural networks and enable more applications of artificial intelligence in biomedical imaging.

Download Full-text

Evaluation of Power Insulator Detection Efficiency with the Use of Limited Training Dataset

Applied Sciences ◽

10.3390/app10062104 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2104

Author(s):

Michał Tomaszewski ◽

Paweł Michalski ◽

Jakub Osuchowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Object Detection ◽

Convolutional Neural Network ◽

Deep Neural Networks ◽

Detection Efficiency ◽

Training Data ◽

Training Dataset ◽

Training Set ◽

Convolutional Network

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.

Download Full-text

A Survey on Various Available Object Detection Models and Application In Automatic License Plate Detection

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05222 ◽

2021 ◽

Vol 23 (06) ◽

pp. 47-57

Author(s):

Aditya Kulkarni ◽

◽

Manali Munot ◽

Sai Salunkhe ◽

Shubham Mhaske ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Image Databases ◽

License Plate ◽

Learning Models ◽

Python Language ◽

Performance Accuracy ◽

License Plate Detection ◽

Bounding Boxes ◽

Complex Images

With the development in technologies right from serial to parallel computing, GPU, AI, and deep learning models a series of tools to process complex images have been developed. The main focus of this research is to compare various algorithms(pre-trained models) and their contributions to process complex images in terms of performance, accuracy, time, and their limitations. The pre-trained models we are using are CNN, R-CNN, R-FCN, and YOLO. These models are python language-based and use libraries like TensorFlow, OpenCV, and free image databases (Microsoft COCO and PAS-CAL VOC 2007/2012). These not only aim at object detection but also on building bounding boxes around appropriate locations. Thus, by this review, we get a better vision of these models and their performance and a good idea of which models are ideal for various situations.

Download Full-text

Automated detection of sacroiliitis on plain radiograph using EfficientDet algorithm in young patients with back pain: a pilot study

10.21203/rs.3.rs-272234/v1 ◽

2021 ◽

Author(s):

Sung Hyun Noh ◽

Chansik An ◽

Dain Kim ◽

Seung Hyun Lee ◽

Min-Yung Chang ◽

...

Keyword(s):

Low Back Pain ◽

Deep Learning ◽

Back Pain ◽

Object Detection ◽

Sacroiliac Joint ◽

Plain Radiograph ◽

Young Patients ◽

Training Dataset ◽

Detection Accuracy ◽

Low Back

Abstract Background A computer algorithm that automatically detects sacroiliac joint abnormalities on plain radiograph would help radiologists avoid missing sacroiliitis. This study aimed to develop and validate a deep learning model to detect and diagnose sacroiliitis on plain radiograph in young patients with low back pain. Methods This Institutional Review Board-approved retrospective study included 478 and 468 plain radiographs from 241 and 433 young (< 40 years) patients who complained of low back pain with and without ankylosing spondylitis, respectively. They were randomly split into training and test datasets with a ratio of 8:2. Radiologists reviewed the images and labeled the coordinates of a bounding box and determined the presence or absence of sacroiliitis for each sacroiliac joint. We fine-tined and optimized the EfficientDet-D4 object detection model pre-trained on the COCO 2107 dataset on the training dataset and validated the final model on the test dataset. Results The mean average precision, an evaluation metric for object detection accuracy, was 0.918 at 0.5 intersection over union. In the diagnosis of sacroiliitis, the area under the curve, sensitivity, specificity, accuracy, and F1-score were 0.932 (95% confidence interval, 0.903–0.961), 96.9% (92.9–99.0), 86.8% (81.5–90.9), 91.1% (87.7–93.7), and 90.2% (85.0–93.9), respectively. Conclusions The EfficientDet, a deep learning-based object detection algorithm, could be used to automatically diagnose sacroiliitis on plain radiograph.

Download Full-text

Deep learning to identify and predict cardiotoxicities of anticancer drugs.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e15012 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e15012-e15012

Author(s):

Mayur Sarangdhar ◽

Venkatesh Kolli ◽

William Seibel ◽

John Peter Perentesis

Keyword(s):

Deep Learning ◽

Alkylating Agents ◽

Adverse Event Reporting System ◽

Training Data ◽

Training Dataset ◽

Cancer Drugs ◽

Conduction Abnormalities ◽

Tree Classifier ◽

Anti Cancer ◽

Safety Signals

e15012 Background: Recent advances in cancer treatment have revolutionized patient outcomes. However, toxicities associated with anti-cancer drugs remain a concern with many anti-cancer drugs now implicated in cardiotoxicity. The complete spectrum of cardiotoxicity associated with anti-cancer drugs is only evident post-approval of drugs. Deep Learning methods can identify novel and emerging safety signals in “real-world” clinical settings. Methods: We used AERS Mine, an open-source data mining platform to identify drug toxicity signatures in the FDA’s Adverse Event Reporting System of 16 million patients. We identified 1.3 million patients on traditional and targeted anti-cancer therapy to analyze therapy-specific cardiotoxicity patterns. Cardiotoxicity training dataset contained 1571 molecules characterized with bioassay against hERG potassium channel and included 350 toxic compounds with an IC50 of < 1μM. We implemented a Deep Belief Network to extract a deep hierarchical representation of the training data, and the Extra Tree Classifier to predict the toxicity of drug candidates. Drugs were encoded using 1024-bit Morgan fingerprint representation using SMILES with search radius of 7 atoms. Pharmacovigilance metrics (Relative Risks and safety signals) were used to establish statistical correlation. Results: This analysis identified signatures of arrhythmias and conduction abnormalities associated with common anti-cancer drugs (e.g. atrial fibrillation with ibrutinib, alkylating agents, immunomodulatory drugs; sinus bradycardia with 5FU, paclitaxel, thalidomide; sinus tachycardia with anthracyclines). Our analysis also identified myositis/myocarditis association with newer immune checkpoint inhibitors (e.g., atezolizumab, durvalumab, cemiplimab, avelumab) paralleling earlier signals for pembrolizumab, nivolumab, and ipilimumab. Deep Learning identified signatures of chemical moieties linked to cardiotoxicity, including common motifs in drugs associated with arrhythmias and conduction abnormalities with an accuracy of 89%. Conclusions: Deep Learning provides a comprehensive insight into emerging cardiotoxicity patterns of approved and investigational drugs, allows detection of ‘rogue’ chemical moieties, and shows promise for novel drug discovery and development.

Download Full-text

Detecting Objects from Space: An Evaluation of Deep-Learning Modern Approaches

Electronics ◽

10.3390/electronics9040583 ◽

2020 ◽

Vol 9 (4) ◽

pp. 583 ◽

Cited By ~ 6

Author(s):

Khang Nguyen ◽

Nhut T. Huynh ◽

Phat C. Nguyen ◽

Khanh-Duy Nguyen ◽

Nguyen D. Vo ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Unmanned Aircraft ◽

Aerial Images ◽

Great Success ◽

Single Shot ◽

Convolutional Networks ◽

Image Pyramids ◽

Fully Convolutional Networks ◽

Wide Range

Unmanned aircraft systems or drones enable us to record or capture many scenes from the bird’s-eye view and they have been fast deployed to a wide range of practical domains, i.e., agriculture, aerial photography, fast delivery and surveillance. Object detection task is one of the core steps in understanding videos collected from the drones. However, this task is very challenging due to the unconstrained viewpoints and low resolution of captured videos. While deep-learning modern object detectors have recently achieved great success in general benchmarks, i.e., PASCAL-VOC and MS-COCO, the robustness of these detectors on aerial images captured by drones is not well studied. In this paper, we present an evaluation of state-of-the-art deep-learning detectors including Faster R-CNN (Faster Regional CNN), RFCN (Region-based Fully Convolutional Networks), SNIPER (Scale Normalization for Image Pyramids with Efficient Resampling), Single-Shot Detector (SSD), YOLO (You Only Look Once), RetinaNet, and CenterNet for the object detection in videos captured by drones. We conduct experiments on VisDrone2019 dataset which contains 96 videos with 39,988 annotated frames and provide insights into efficient object detectors for aerial images.

Download Full-text

DEEP LEARNING BASED ROOF TYPE CLASSIFICATION USING VERY HIGH RESOLUTION AERIAL IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2021-55-2021 ◽

2021 ◽

Vol XLIII-B3-2021 ◽

pp. 55-60

Author(s):

M. Buyukdemircioglu ◽

R. Can ◽

S. Kocaman

Keyword(s):

Deep Learning ◽

High Resolution ◽

Urban Areas ◽

Image Features ◽

Training Data ◽

Fine Tuning ◽

Computer Hardware ◽

Geographical Information ◽

Training Dataset ◽

Very High

Abstract. Automatic detection, segmentation and reconstruction of buildings in urban areas from Earth Observation (EO) data are still challenging for many researchers. Roof is one of the most important element in a building model. The three-dimensional geographical information system (3D GIS) applications generally require the roof type and roof geometry for performing various analyses on the models, such as energy efficiency. The conventional segmentation and classification methods are often based on features like corners, edges and line segments. In parallel to the developments in computer hardware and artificial intelligence (AI) methods including deep learning (DL), image features can be extracted automatically. As a DL technique, convolutional neural networks (CNNs) can also be used for image classification tasks, but require large amount of high quality training data for obtaining accurate results. The main aim of this study was to generate a roof type dataset from very high-resolution (10 cm) orthophotos of Cesme, Turkey, and to classify the roof types using a shallow CNN architecture. The training dataset consists 10,000 roof images and their labels. Six roof type classes such as flat, hip, half-hip, gable, pyramid and complex roofs were used for the classification in the study area. The prediction performance of the shallow CNN model used here was compared with the results obtained from the fine-tuning of three well-known pre-trained networks, i.e. VGG-16, EfficientNetB4, ResNet-50. The results show that although our CNN has slightly lower performance expressed with the overall accuracy, it is still acceptable for many applications using sparse data.

Download Full-text

Improving generalization of deep learning models for diagnostic pathology by increasing variability in training data: experiments on osteosarcoma subtypes

10.1101/2020.09.10.20192294 ◽

2020 ◽

Author(s):

Haiming Tang ◽

Nanfei Sun ◽

Steven Shen

Keyword(s):

Deep Learning ◽

Model Performance ◽

High Variability ◽

Training Data ◽

Classification Model ◽

Training Dataset ◽

Learning Models ◽

Diagnostic Pathology ◽

Model Generalization ◽

Histopathological Images

Artificial intelligence (AI) has an emerging progress in diagnostic pathology. A large number of studies of applying deep learning models to histopathological images have been published in recent years. While many studies claim high accuracies, they may fall into the pitfalls of overfitting and lack of generalization due to the high variability of the histopathological images. We use the example of Osteosarcoma to illustrate the pitfalls and how the addition of model input variability can help improve model performance. We use the publicly available osteosarcoma dataset to retrain a previously published classification model for osteosarcoma. We partition the same set of images into the training and testing datasets differently than the original study: the test dataset consists of images from one patient while the training dataset consists images of all other patients. The performance of the model on the test set using the new partition schema declines dramatically, indicating a lack of model generalization and overfitting.We also show the influence of training data variability on model performance by collecting a minimal dataset of 10 osteosarcoma subtypes as well as benign tissues and benign bone tumors of differentiation. We show the additions of more and more subtypes into the training data step by step under the same model schema yield a series of coherent models with increasing performances. In conclusion, we bring forward data preprocessing and collection tactics for histopathological images of high variability to avoid the pitfalls of overfitting and build deep learning models of higher generalization abilities.

Download Full-text