Investigating Weak Supervision in Deep Ranking

Yukun Zheng; Yiqun Liu; Zhen Fan; Cheng Luo; Qingyao Ai; Min Zhang; Shaoping Ma

doi:10.2478/dim-2019-0010

Investigating Weak Supervision in Deep Ranking

Data and Information Management ◽

10.2478/dim-2019-0010 ◽

2019 ◽

Vol 3 (3) ◽

pp. 155-164 ◽

Cited By ~ 1

Author(s):

Yukun Zheng ◽

Yiqun Liu ◽

Zhen Fan ◽

Cheng Luo ◽

Qingyao Ai ◽

...

Keyword(s):

Large Scale ◽

Document Retrieval ◽

Exact Matching ◽

Weak Supervision ◽

Model Based ◽

Ranking Models ◽

Search Engine Result ◽

Supervised Methods ◽

Click Model ◽

Weakly Supervised

Abstract A number of deep neural networks have been proposed to improve the performance of document ranking in information retrieval studies. However, the training processes of these models usually need a large scale of labeled data, leading to data shortage becoming a major hindrance to the improvement of neural ranking models’ performances. Recently, several weakly supervised methods have been proposed to address this challenge with the help of heuristics or users’ interaction in the Search Engine Result Pages (SERPs) to generate weak relevance labels. In this work, we adopt two kinds of weakly supervised relevance, BM25-based relevance and click model-based relevance, and make a deep investigation into their differences in the training of neural ranking models. Experimental results show that BM25-based relevance helps models capture more exact matching signals, while click model-based relevance enhances the rankings of documents that may be preferred by users. We further proposed a cascade ranking framework to combine the two weakly supervised relevance, which significantly promotes the ranking performance of neural ranking models and outperforms the best result in the last NTCIR-13 We Want Web (WWW) task. This work reveals the potential of constructing better document retrieval systems based on multiple kinds of weak relevance signals.

Download Full-text

Causal Knowledge Extraction through Large-Scale Text Mining

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i09.7092 ◽

2020 ◽

Vol 34 (09) ◽

pp. 13610-13611

Author(s):

Oktie Hassanzadeh ◽

Debarun Bhattacharjya ◽

Mark Feblowitz ◽

Kavitha Srinivas ◽

Michael Perrone ◽

...

Keyword(s):

Large Scale ◽

Causal Relation ◽

Relation Extraction ◽

Commercial Application ◽

Causal Knowledge ◽

Text Documents ◽

Semantic Constraints ◽

Enterprise Risk ◽

Supervised Methods ◽

Weakly Supervised

In this demonstration, we present a system for mining causal knowledge from large corpuses of text documents, such as millions of news articles. Our system provides a collection of APIs for causal analysis and retrieval. These APIs enable searching for the effects of a given cause and the causes of a given effect, as well as the analysis of existence of causal relation given a pair of phrases. The analysis includes a score that indicates the likelihood of the existence of a causal relation. It also provides evidence from an input corpus supporting the existence of a causal relation between input phrases. Our system uses generic unsupervised and weakly supervised methods of causal relation extraction that do not impose semantic constraints on causes and effects. We show example use cases developed for a commercial application in enterprise risk management.

Download Full-text

IWE-Net: Instance Weight Network for Locating Negative Comments and its application to improve Traffic User Experience

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5823 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4052-4059

Author(s):

Lan-Zhe Guo ◽

Feng Kuang ◽

Zhang-Xun Liu ◽

Yu-Feng Li ◽

Nan Ma ◽

...

Keyword(s):

Supervised Learning ◽

User Experience ◽

Large Scale ◽

Gradient Methods ◽

Sampling Bias ◽

Weakly Supervised Learning ◽

Label Noise ◽

Weak Supervision ◽

Ride Sharing ◽

Weakly Supervised

Weakly supervised learning aims at coping with scarce labeled data. Previous weakly supervised studies typically assume that there is only one kind of weak supervision in data. In many applications, however, raw data usually contains more than one kind of weak supervision at the same time. For example, in user experience enhancement from Didi, one of the largest online ride-sharing platforms, the ride comment data contains severe label noise (due to the subjective factors of passengers) and severe label distribution bias (due to the sampling bias). We call such a problem as ‘compound weakly supervised learning’. In this paper, we propose the CWSL method to address this problem based on Didi ride-sharing comment data. Specifically, an instance reweighting strategy is employed to cope with severe label noise in comment data, where the weights for harmful noisy instances are small. Robust criteria like AUC rather than accuracy and the validation performance are optimized for the correction of biased data label. Alternating optimization and stochastic gradient methods accelerate the optimization on large-scale data. Experiments on Didi ride-sharing comment data clearly validate the effectiveness. We hope this work may shed some light on applying weakly supervised learning to complex real situations.

Download Full-text

Answering Binary Causal Questions Through Large-Scale Text Mining: An Evaluation Using Cause-Effect Pairs from Human Experts

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/695 ◽

2019 ◽

Cited By ~ 6

Author(s):

Oktie Hassanzadeh ◽

Debarun Bhattacharjya ◽

Mark Feblowitz ◽

Kavitha Srinivas ◽

Michael Perrone ◽

...

Keyword(s):

Neural Network ◽

Decision Making ◽

Large Scale ◽

State Of The Art ◽

Language Models ◽

Training Set ◽

Supervised Methods ◽

Weakly Supervised ◽

Answering Questions ◽

Large Corpus

In this paper, we study the problem of answering questions of type "Could X cause Y?" where X and Y are general phrases without any constraints. Answering such questions will assist with various decision analysis tasks such as verifying and extending presumed causal associations used for decision making. Our goal is to analyze the ability of an AI agent built using state-of-the-art unsupervised methods in answering causal questions derived from collections of cause-effect pairs from human experts. We focus only on unsupervised and weakly supervised methods due to the difficulty of creating a large enough training set with a reasonable quality and coverage. The methods we examine rely on a large corpus of text derived from news articles, and include methods ranging from large-scale application of classic NLP techniques and statistical analysis to the use of neural network based phrase embeddings and state-of-the-art neural language models.

Download Full-text

WEAKLY SUPERVISED SEMANTIC SEGMENTATION OF SATELLITE IMAGES FOR LAND COVER MAPPING – CHALLENGES AND OPPORTUNITIES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2020-795-2020 ◽

2020 ◽

Vol V-3-2020 ◽

pp. 795-802

Author(s):

M. Schmitt ◽

J. Prexl ◽

P. Ebel ◽

L. Liebel ◽

X. X. Zhu

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Large Scale ◽

Training Data ◽

Supervised Machine Learning ◽

Land Cover Mapping ◽

Weak Supervision ◽

Challenges And Opportunities ◽

Weakly Supervised ◽

Large Scale Land

Abstract. Fully automatic large-scale land cover mapping belongs to the core challenges addressed by the remote sensing community. Usually, the basis of this task is formed by (supervised) machine learning models. However, in spite of recent growth in the availability of satellite observations, accurate training data remains comparably scarce. On the other hand, numerous global land cover products exist and can be accessed often free-of-charge. Unfortunately, these maps are typically of a much lower resolution than modern day satellite imagery. Besides, they always come with a significant amount of noise, as they cannot be considered ground truth, but are products of previous (semi-)automatic prediction tasks. Therefore, this paper seeks to make a case for the application of weakly supervised learning strategies to get the most out of available data sources and achieve progress in high-resolution large-scale land cover mapping. Challenges and opportunities are discussed based on the SEN12MS dataset, for which also some baseline results are shown. These baselines indicate that there is still a lot of potential for dedicated approaches designed to deal with remote sensing-specific forms of weak supervision.

Download Full-text

Texts as Lines: Text Detection with Weak Supervision

Mathematical Problems in Engineering ◽

10.1155/2020/3871897 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Weijia Wu ◽

Jici Xing ◽

Cheng Yang ◽

Yuxing Wang ◽

Hong Zhou

Keyword(s):

Synthetic Data ◽

Text Detection ◽

Detection Methods ◽

Deep Convolutional Neural Networks ◽

Weak Supervision ◽

Annotation Information ◽

Scene Text Detection ◽

Supervised Methods ◽

Weakly Supervised ◽

Curved Text

Scene text detection methods based on deep learning have recently shown remarkable improvement. Most text detection methods train deep convolutional neural networks with full masks requiring pixel accuracy for good quality training. Normally, a skilled engineer needs to drag tens of points to create a full mask for the curved text. Therefore, data labelling based on full masks is time consuming and laborious, particularly for curved texts. To reduce the labelling cost, a weakly supervised method is first proposed in this paper. Unlike the other detectors (e.g., PSENet or TextSnake) that use full masks, our method only needs coarse masks for training. More specifically, the coarse mask for one text instance is a line across the text region in our method. Compared with full mask labelling, data labelling using the proposed method could save labelling time while losing much annotation information. In this context, a network pretrained on synthetic data with full masks is used to enhance the coarse masks in a real image. Finally, the enhanced masks are fed back to train our network. Analysis of experiments performed using the model shows that the performance of our method is close to that of the fully supervised methods on ICDAR2015, CTW1500, Total-Text, and MSRA-TD5000.

Download Full-text

Model-based Control Techniques for Large-Scale High-Precision Stage

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.140.272 ◽

2020 ◽

Vol 140 (4) ◽

pp. 272-280

Author(s):

Wataru Ohnishi ◽

Hiroshi Fujimoto ◽

Koichi Sakata

Keyword(s):

High Precision ◽

Large Scale ◽

Model Based Control ◽

Precision Stage ◽

Control Techniques ◽

Model Based

Download Full-text

Model-based Identification, Estimation, and Control for Large-scale Urban Road Networks

2020 European Control Conference (ECC) ◽

10.23919/ecc51009.2020.9143995 ◽

2020 ◽

Author(s):

Isik Ilber Sirmatel ◽

Nikolas Geroliminis

Keyword(s):

Large Scale ◽

Road Networks ◽

Urban Road ◽

Model Based ◽

Estimation And Control ◽

And Control

Download Full-text

A Model-Based Real-Time Intrusion Detection System for Large Scale Heterogeneous Networks

10.21236/ada420824 ◽

2003 ◽

Cited By ~ 1

Author(s):

Richard A. Kemmer ◽

Giovanni Vigna

Keyword(s):

Intrusion Detection ◽

Real Time ◽

Heterogeneous Networks ◽

Intrusion Detection System ◽

Large Scale ◽

Detection System ◽

Model Based

Download Full-text

DEVELOPMENT AND VALIDATION OF A LARGE-SCALE GLACIER MODEL BASED ON AN ENERGY BALANCE APPROACH OVER CENTRAL EUROPE

Journal of Japan Society of Civil Engineers Ser B1 (Hydraulic Engineering) ◽

10.2208/jscejhe.75.2_i_919 ◽

2019 ◽

Vol 75 (2) ◽

pp. I_919-I_924

Author(s):

Orie SASAKI ◽

Koji FUJITA ◽

Akiko SAKAI ◽

Yukiko HIRABAYASHI ◽

Shinjiro KANAE

Keyword(s):

Energy Balance ◽

Central Europe ◽

Large Scale ◽

Model Based ◽

Balance Approach ◽

Development And Validation ◽

Energy Balance Approach

Download Full-text

Model-based Evaluations Combining Autonomous Cars and a Large-scale Passenger Drone Service: The Bavarian Case Study

2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc45102.2020.9294183 ◽

2020 ◽

Author(s):

Christoph Maget ◽

Sebastian Gutmann ◽

Klaus Bogenberger

Keyword(s):

Large Scale ◽

Model Based ◽

Autonomous Cars

Download Full-text