Enhancing Multi-Camera People Detection by Online Automatic Parametrization Using Detection Transfer and Self-Correlation Maximization †

Rafael Martín-Nieto; Álvaro García-Martín; José Martínez; Juan SanMiguel

doi:10.3390/s18124385

Enhancing Multi-Camera People Detection by Online Automatic Parametrization Using Detection Transfer and Self-Correlation Maximization †

Sensors ◽

10.3390/s18124385 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4385

Author(s):

Rafael Martín-Nieto ◽

Álvaro García-Martín ◽

José Martínez ◽

Juan SanMiguel

Keyword(s):

Critical Parameter ◽

State Of The Art ◽

Detection Threshold ◽

Ground Truth ◽

High Variability ◽

Experimental Results ◽

People Detection ◽

Training Time ◽

Detection Model ◽

Complicated Task

Finding optimal parametrizations for people detectors is a complicated task due to the large number of parameters and the high variability of application scenarios. In this paper, we propose a framework to adapt and improve any detector automatically in multi-camera scenarios where people are observed from various viewpoints. By accurately transferring detector results between camera viewpoints and by self-correlating these transferred results, the best configuration (in this paper, the detection threshold) for each detector-viewpoint pair is identified online without requiring any additional manually-labeled ground truth apart from the offline training of the detection model. Such a configuration consists of establishing the confidence detection threshold present in every people detector, which is a critical parameter affecting detection performance. The experimental results demonstrate that the proposed framework improves the performance of four different state-of-the-art detectors (DPM , ACF, faster R-CNN, and YOLO9000) whose Optimal Fixed Thresholds (OFTs) have been determined and fixed during training time using standard datasets.

Download Full-text

Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information †

Sensors ◽

10.3390/s19010004 ◽

2018 ◽

Vol 19 (1) ◽

pp. 4

Author(s):

Álvaro García-Martín ◽

Juan SanMiguel ◽

José Martínez

Keyword(s):

Mutual Information ◽

Detection Threshold ◽

Ground Truth ◽

Training Data ◽

Training Dataset ◽

People Detection ◽

Detection Model ◽

Unseen Data ◽

Bounding Boxes ◽

Coarse To Fine

Applying people detectors to unseen data is challenging since patterns distributions, such as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt frame by frame people detectors during runtime classification, without requiring any additional manually labeled ground truth apart from the offline training of the detection model. Such adaptation make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation discriminates between relevant instants in a video sequence, i.e., identifies the representative frames for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration (i.e., detection threshold) of each detector under analysis, maximizing the mutual information to obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not require training the detectors for each new scenario and uses standard people detector outputs, i.e., bounding boxes. The experimental results demonstrate that the proposed approach outperforms state-of-the-art detectors whose optimal threshold configurations are previously determined and fixed from offline training data.

Download Full-text

A text-based multi-span network for reading comprehension

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200581 ◽

2021 ◽

pp. 1-13

Author(s):

Deguang Chen ◽

Ziping Ma ◽

Lin Wei ◽

Yanbin Zhu ◽

Jinlin Ma ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Market Value ◽

The State ◽

Experimental Results ◽

Training Time ◽

The Public

Text-based reading comprehension models have great research significance and market value and are one of the main directions of natural language processing. Reading comprehension models of single-span answers have recently attracted more attention and achieved significant results. In contrast, multi-span answer models for reading comprehension have been less investigated and their performances need improvement. To address this issue, in this paper, we propose a text-based multi-span network for reading comprehension, ALBERT_SBoundary, and build a multi-span answer corpus, MultiSpan_NMU. We also conduct extensive experiments on the public multi-span corpus, MultiSpan_DROP, and our multi-span answer corpus, MultiSpan_NMU, and compare the proposed method with the state-of-the-art. The experimental results show that our proposed method achieves F1 scores of 84.10 and 92.88 on MultiSpan_DROP and MultiSpan_NMU datasets, respectively, while it also has fewer parameters and a shorter training time.

Download Full-text

Automated Ground Truth Generation for Learning-Based Crack Detection on Concrete Surfaces

Applied Sciences ◽

10.3390/app112210966 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10966

Author(s):

Hsiang-Chieh Chen ◽

Zheng-Ting Li

Keyword(s):

Deep Learning ◽

Crack Detection ◽

Ground Truth ◽

Experimental Results ◽

Training Data ◽

Generation Process ◽

Detection Model ◽

Ground Truth Generation ◽

Labeling Approach ◽

The Cost

This article introduces an automated data-labeling approach for generating crack ground truths (GTs) within concrete images. The main algorithm includes generating first-round GTs, pre-training a deep learning-based model, and generating second-round GTs. On the basis of the generated second-round GTs of the training data, a learning-based crack detection model can be trained in a self-supervised manner. The pre-trained deep learning-based model is effective for crack detection after it is re-trained using the second-round GTs. The main contribution of this study is the proposal of an automated GT generation process for training a crack detection model at the pixel level. Experimental results show that the second-round GTs are similar to manually marked labels. Accordingly, the cost of implementing learning-based methods is reduced significantly because data labeling by humans is not necessitated.

Download Full-text

Recovering Accurate Labeling Information from Partially Valid Data for Effective Multi-Label Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/191 ◽

2020 ◽

Author(s):

Ximing Li ◽

Yang Wang

Keyword(s):

State Of The Art ◽

Ground Truth ◽

The State ◽

Label Propagation ◽

Experimental Results ◽

Two Stage ◽

Training Instance ◽

Valid Data

Partial Multi-label Learning (PML) aims to induce the multi-label predictor from datasets with noisy supervision, where each training instance is associated with several candidate labels but only partially valid. To address the noisy issue, the existing PML methods basically recover the ground-truth labels by leveraging the ground-truth confidence of the candidate label, i.e., the likelihood of a candidate label being a ground-truth one. However, they neglect the information from non-candidate labels, which potentially contributes to the ground-truth label recovery. In this paper, we propose to recover the ground-truth labels, i.e., estimating the ground-truth confidences, from the label enrichment, composed of the relevance degrees of candidate labels and irrelevance degrees of non-candidate labels. Upon this observation, we further develop a novel two-stage PML method, namely Partial Multi-Label Learning with Label Enrichment-Recovery (PML3ER), where in the first stage, it estimates the label enrichment with unconstrained label propagation, then jointly learns the ground-truth confidence and multi-label predictor given the label enrichment. Experimental results validate that PML3ER outperforms the state-of-the-art PML methods.

Download Full-text

RS-SSKD: Self-Supervision Equipped with Knowledge Distillation for Few-Shot Remote Sensing Scene Classification

Sensors ◽

10.3390/s21051566 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1566

Author(s):

Pei Zhang ◽

Ying Li ◽

Dong Wang ◽

Jiyue Wang

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Ground Truth ◽

Training Data ◽

Scene Classification ◽

Training Time ◽

Shot Classification ◽

Meta Learning ◽

Knowledge Distillation ◽

Few Data

While growing instruments generate more and more airborne or satellite images, the bottleneck in remote sensing (RS) scene classification has shifted from data limits toward a lack of ground truth samples. There are still many challenges when we are facing unknown environments, especially those with insufficient training data. Few-shot classification offers a different picture under the umbrella of meta-learning: digging rich knowledge from a few data are possible. In this work, we propose a method named RS-SSKD for few-shot RS scene classification from a perspective of generating powerful representation for the downstream meta-learner. Firstly, we propose a novel two-branch network that takes three pairs of original-transformed images as inputs and incorporates Class Activation Maps (CAMs) to drive the network mining, the most relevant category-specific region. This strategy ensures that the network generates discriminative embeddings. Secondly, we set a round of self-knowledge distillation to prevent overfitting and boost the performance. Our experiments show that the proposed method surpasses current state-of-the-art approaches on two challenging RS scene datasets: NWPU-RESISC45 and RSD46-WHU. Finally, we conduct various ablation experiments to investigate the effect of each component of the proposed method and analyze the training time of state-of-the-art methods and ours.

Download Full-text

Browser Security Attacks and Detection Techniques: A Case of Tabnabbing

Science & Technology Journal ◽

10.22232/stj.2020.08.01.03 ◽

2020 ◽

Vol 8 (1) ◽

pp. 33-41

Author(s):

Dr. S. Sarika ◽

Keyword(s):

Credit Card ◽

State Of The Art ◽

Experimental Results ◽

Detection Technique ◽

Security Attacks ◽

Agent Based ◽

Detection Techniques ◽

Browser Security ◽

Cyber Threats ◽

Multi Agent

Phishing is a malicious and deliberate act of sending counterfeit messages or mimicking a webpage. The goal is either to steal sensitive credentials like login information and credit card details or to install malware on a victim’s machine. Browser-based cyber threats have become one of the biggest concerns in networked architectures. The most prolific form of browser attack is tabnabbing which happens in inactive browser tabs. In a tabnabbing attack, a fake page disguises itself as a genuine page to steal data. This paper presents a multi agent based tabnabbing detection technique. The method detects heuristic changes in a webpage when a tabnabbing attack happens and give a warning to the user. Experimental results show that the method performs better when compared with state of the art tabnabbing detection techniques.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

COVID-19 infection map generation and detection from chest X-ray images

Health Information Science and Systems ◽

10.1007/s13755-021-00146-8 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Aysen Degerli ◽

Mete Ahishali ◽

Mehmet Yamac ◽

Serkan Kiranyaz ◽

Muhammad E. H. Chowdhury ◽

...

Keyword(s):

State Of The Art ◽

Ground Truth ◽

Clinical Use ◽

X Ray ◽

Learning Techniques ◽

Map Generation ◽

Severity Grading ◽

Chest X Ray ◽

Novel Method ◽

Aided Diagnosis

AbstractComputer-aided diagnosis has become a necessity for accurate and immediate coronavirus disease 2019 (COVID-19) detection to aid treatment and prevent the spread of the virus. Numerous studies have proposed to use Deep Learning techniques for COVID-19 diagnosis. However, they have used very limited chest X-ray (CXR) image repositories for evaluation with a small number, a few hundreds, of COVID-19 samples. Moreover, these methods can neither localize nor grade the severity of COVID-19 infection. For this purpose, recent studies proposed to explore the activation maps of deep networks. However, they remain inaccurate for localizing the actual infestation making them unreliable for clinical use. This study proposes a novel method for the joint localization, severity grading, and detection of COVID-19 from CXR images by generating the so-called infection maps. To accomplish this, we have compiled the largest dataset with 119,316 CXR images including 2951 COVID-19 samples, where the annotation of the ground-truth segmentation masks is performed on CXRs by a novel collaborative human–machine approach. Furthermore, we publicly release the first CXR dataset with the ground-truth segmentation masks of the COVID-19 infected regions. A detailed set of experiments show that state-of-the-art segmentation networks can learn to localize COVID-19 infection with an F1-score of 83.20%, which is significantly superior to the activation maps created by the previous methods. Finally, the proposed approach achieved a COVID-19 detection performance with 94.96% sensitivity and 99.88% specificity.

Download Full-text

Automatic Detection of Discrimination Actions from Social Images

Electronics ◽

10.3390/electronics10030325 ◽

2021 ◽

Vol 10 (3) ◽

pp. 325

Author(s):

Zhihao Wu ◽

Baopeng Zhang ◽

Tianchen Zhou ◽

Yan Li ◽

Jianping Fan

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Automatic Detection ◽

Experimental Results ◽

Practical Approach ◽

Detection And Identification ◽

Art Methods ◽

Image Set ◽

Social Images ◽

Relationship Identification

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.

Download Full-text

Multiview deep learning based on tensor decomposition and its application in fault detection of overhead contact systems

The Visual Computer ◽

10.1007/s00371-021-02080-y ◽

2021 ◽

Author(s):

Xuewu Zhang ◽

Yansheng Gong ◽

Chen Qiao ◽

Wenfeng Jing

Keyword(s):

High Speed ◽

Tensor Decomposition ◽

Detection Methods ◽

Detection Accuracy ◽

Feature Maps ◽

Training Time ◽

Detection Model ◽

Railway Line ◽

Result Show ◽

Deep Layers

AbstractThis article mainly focuses on the most common types of high-speed railways malfunctions in overhead contact systems, namely, unstressed droppers, foreign-body invasions, and pole number-plate malfunctions, to establish a deep-network detection model. By fusing the feature maps of the shallow and deep layers in the pretraining network, global and local features of the malfunction area are combined to enhance the network's ability of identifying small objects. Further, in order to share the fully connected layers of the pretraining network and reduce the complexity of the model, Tucker tensor decomposition is used to extract features from the fused-feature map. The operation greatly reduces training time. Through the detection of images collected on the Lanxin railway line, experiments result show that the proposed multiview Faster R-CNN based on tensor decomposition had lower miss probability and higher detection accuracy for the three types faults. Compared with object-detection methods YOLOv3, SSD, and the original Faster R-CNN, the average miss probability of the improved Faster R-CNN model in this paper is decreased by 37.83%, 51.27%, and 43.79%, respectively, and average detection accuracy is increased by 3.6%, 9.75%, and 5.9%, respectively.

Download Full-text