Selecting Examples in Manifold Reduced Feature Space for Active Learning

Author(s):  
C. Silva ◽  
B. Ribeiro
Author(s):  
Agnes Tegen ◽  
Paul Davidsson ◽  
Jan A. Persson

Abstract The advances in Internet of things lead to an increased number of devices generating and streaming data. These devices can be useful data sources for activity recognition by using machine learning. However, the set of available sensors may vary over time, e.g. due to mobility of the sensors and technical failures. Since the machine learning model uses the data streams from the sensors as input, it must be able to handle a varying number of input variables, i.e. that the feature space might change over time. Moreover, the labelled data necessary for the training is often costly to acquire. In active learning, the model is given a budget for requesting labels from an oracle, and aims to maximize accuracy by careful selection of what data instances to label. It is generally assumed that the role of the oracle only is to respond to queries and that it will always do so. In many real-world scenarios however, the oracle is a human user and the assumptions are simplifications that might not give a proper depiction of the setting. In this work we investigate different interactive machine learning strategies, out of which active learning is one, which explore the effects of an oracle that can be more proactive and factors that might influence a user to provide or withhold labels. We implement five interactive machine learning strategies as well as hybrid versions of them and evaluate them on two datasets. The results show that a more proactive user can improve the performance, especially when the user is influenced by the accuracy of earlier predictions. The experiments also highlight challenges related to evaluating performance when the set of classes is changing over time.


Proceedings ◽  
2019 ◽  
Vol 31 (1) ◽  
pp. 80
Author(s):  
Nela Grimova ◽  
Martin Macas

Active learning is very useful for classification problems where it is hard or time-consuming to acquire classes of data in order to create a subset for training a classifier. The classification of over-night polysomnography records to sleep stages is an example of such application because an expert has to annotate a large number of segments of a record. Active learning methods enable us to iteratively select only the most informative instances for the manual classification so the total expert’s effort is reduced. However, the process is able to be insufficiently initialised because of a large dimensionality of polysomnography (PSG) data, so the fast convergence of active learning is at risk. In order to prevent this threat, we have proposed a variant of the query-by-committee active learning scenario which take into account all features of data so it is not necessary to reduce a feature space, but the process is quickly initialised. The proposed method is compared to random sampling and margin uncertainty sampling which is another well-known active learning method. It was shown that, during crucial first iteration of the process, the provided variant of query-by-committee acquired the best results among other strategies in most cases.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 414
Author(s):  
Dominique Albert-Weiss ◽  
Ahmad Osman

A pivotal topic in agriculture and food monitoring is the assessment of the quality and ripeness of agricultural products by using non-destructive testing techniques. Acoustic testing offers a rapid in situ analysis of the state of the agricultural good, obtaining global information of its interior. While deep learning (DL) methods have outperformed state-of-the-art benchmarks in various applications, the reason for lacking adaptation of DL algorithms such as convolutional neural networks (CNNs) can be traced back to its high data inefficiency and the absence of annotated data. Active learning is a framework that has been heavily used in machine learning when the labelled instances are scarce or cumbersome to obtain. This is specifically of interest when the DL algorithm is highly uncertain about the label of an instance. By allowing the human-in-the-loop for guidance, a continuous improvement of the DL algorithm based on a sample efficient manner can be obtained. This paper seeks to study the applicability of active learning when grading ‘Galia’ muskmelons based on its shelf life. We propose k-Determinantal Point Processes (k-DPP), which is a purely diversity-based method that allows to take influence on the exploration within the feature space based on the chosen subset k. While getting coequal results to uncertainty-based approaches when k is large, we simultaneously obtain a better exploration of the data distribution. While the implementation based on eigendecomposition takes up a runtime of O(n3), this can further be reduced to O(n·poly(k)) based on rejection sampling. We suggest the use of diversity-based acquisition when only a few labelled samples are available, allowing for better exploration while counteracting the disadvantage of missing the training objective in uncertainty-based methods following a greedy fashion.


2017 ◽  
Vol 85 (8) ◽  
pp. 814-825 ◽  
Author(s):  
Ajeng J. Puspitasari ◽  
Jonathan W. Kanter ◽  
Andrew M. Busch ◽  
Rachel Leonard ◽  
Shira Dunsiger ◽  
...  

2012 ◽  
Author(s):  
Tom Busey ◽  
Chen Yu ◽  
Francisco Parada ◽  
Brandi Emerick ◽  
John Vanderkolk

2008 ◽  
Author(s):  
Lisa Wagner ◽  
Chandra M. Mehrotra
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document