An Emotional Scene Retrieval Framework for Lifelog Videos Using Ensemble Clustering

2015 ◽  
Vol 3 (3) ◽  
pp. 1-13 ◽  
Author(s):  
Hiroki Nomiya ◽  
Atsushi Morikuni ◽  
Teruhisa Hochin

A lifelog video retrieval framework is proposed for the better utilization of a large amount of lifelog video data. The proposed method retrieves emotional scenes such as the scenes in which a person in the video is smiling, considering that a certain important event could happen in most of emotional scenes. The emotional scene is detected on the basis of facial expression recognition using a wide variety of facial features. The authors adopt an unsupervised learning approach called ensemble clustering in order to recognize the facial expressions because supervised learning approaches require sufficient training data, which make it quite troublesome to apply to large-scale video databases. The retrieval performance of the proposed method is evaluated by means of an emotional scene detection experiment from the viewpoints of accuracy and efficiency. In addition, a prototype retrieval system is implemented based on the proposed emotional scene detection method.

2013 ◽  
Vol 1 (4) ◽  
pp. 1-15 ◽  
Author(s):  
Hiroki Nomiya ◽  
Atsushi Morikuni ◽  
Teruhisa Hochin

An emotional scene detection method is proposed in order to retrieve impressive scenes from lifelog videos. The proposed method is based on facial expression recognition considering that a wide variety of facial expression could be observed in impressive scenes. Conventional facial expression techniques, which focus on discriminating typical facial expressions, will be inadequate for lifelog video retrieval because of the diversity of facial expressions. The authors thus propose a more flexible and efficient emotional scene detection method using an unsupervised facial expression recognition based on cluster ensembles. The authors' approach does not need to predefine facial expressions and is able to detect emotional scenes containing a wide variety of facial expressions. The detection performance of the proposed method is evaluated through some emotional scene detection experiments.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Chen Zhang ◽  
Bin Hu ◽  
Yucong Suo ◽  
Zhiqiang Zou ◽  
Yimu Ji

In this paper, we study the challenge of image-to-video retrieval, which uses the query image to search relevant frames from a large collection of videos. A novel framework based on convolutional neural networks (CNNs) is proposed to perform large-scale video retrieval with low storage cost and high search efficiency. Our framework consists of the key-frame extraction algorithm and the feature aggregation strategy. Specifically, the key-frame extraction algorithm takes advantage of the clustering idea so that redundant information is removed in video data and storage cost is greatly reduced. The feature aggregation strategy adopts average pooling to encode deep local convolutional features followed by coarse-to-fine retrieval, which allows rapid retrieval in the large-scale video database. The results from extensive experiments on two publicly available datasets demonstrate that the proposed method achieves superior efficiency as well as accuracy over other state-of-the-art visual search methods.


Author(s):  
Kimiaki Shirahama ◽  
Kuniaki Uehara

This paper examines video retrieval based on Query-By-Example (QBE) approach, where shots relevant to a query are retrieved from large-scale video data based on their similarity to example shots. This involves two crucial problems: The first is that similarity in features does not necessarily imply similarity in semantic content. The second problem is an expensive computational cost to compute the similarity of a huge number of shots to example shots. The authors have developed a method that can filter a large number of shots irrelevant to a query, based on a video ontology that is knowledge base about concepts displayed in a shot. The method utilizes various concept relationships (e.g., generalization/specialization, sibling, part-of, and co-occurrence) defined in the video ontology. In addition, although the video ontology assumes that shots are accurately annotated with concepts, accurate annotation is difficult due to the diversity of forms and appearances of the concepts. Dempster-Shafer theory is used to account the uncertainty in determining the relevance of a shot based on inaccurate annotation of this shot. Experimental results on TRECVID 2009 video data validate the effectiveness of the method.


Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Tao Chen ◽  
Mingfen Wu ◽  
Hexi Li

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.


Author(s):  
SONGHAO ZHU ◽  
ZHIWEI LIANG ◽  
XIAOYUAN JING

Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled data in many real-world application areas, such as video annotation. However, the pairwise similarity metric, a significant factor of existing approaches, has not been fully investigated. That is, these graph-based semi-supervised approaches estimate the pairwise similarity between samples mainly according to the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pairwise similarity measure. Accordingly, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL), is proposed in this paper. This framework is characterized by simultaneously taking into account the spatial and temporal property of video data to achieve more accurate pairwise similarity values. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the benchmark TRECVID data set.


2008 ◽  
Vol 34 (2) ◽  
pp. 193-224 ◽  
Author(s):  
Alessandro Moschitti ◽  
Daniele Pighin ◽  
Roberto Basili

The availability of large scale data sets of manually annotated predicate-argument structures has recently favored the use of machine learning approaches to the design of automated semantic role labeling (SRL) systems. The main research in this area relates to the design choices for feature representation and for effective decompositions of the task in different learning models. Regarding the former choice, structural properties of full syntactic parses are largely employed as they represent ways to encode different principles suggested by the linking theory between syntax and semantics. The latter choice relates to several learning schemes over global views of the parses. For example, re-ranking stages operating over alternative predicate-argument sequences of the same sentence have shown to be very effective. In this article, we propose several kernel functions to model parse tree properties in kernel-based machines, for example, perceptrons or support vector machines. In particular, we define different kinds of tree kernels as general approaches to feature engineering in SRL. Moreover, we extensively experiment with such kernels to investigate their contribution to individual stages of an SRL architecture both in isolation and in combination with other traditional manually coded features. The results for boundary recognition, classification, and re-ranking stages provide systematic evidence about the significant impact of tree kernels on the overall accuracy, especially when the amount of training data is small. As a conclusive result, tree kernels allow for a general and easily portable feature engineering method which is applicable to a large family of natural language processing tasks.


2021 ◽  
Vol 13 (2) ◽  
pp. 275
Author(s):  
Michael Meadows ◽  
Matthew Wilson

Given the high financial and institutional cost of collecting and processing accurate topography data, many large-scale flood hazard assessments continue to rely instead on freely-available global Digital Elevation Models, despite the significant vertical biases known to affect them. To predict (and thereby reduce) these biases, we apply a fully-convolutional neural network (FCN), a form of artificial neural network originally developed for image segmentation which is capable of learning from multi-variate spatial patterns at different scales. We assess its potential by training such a model on a wide variety of remote-sensed input data (primarily multi-spectral imagery), using high-resolution, LiDAR-derived Digital Terrain Models published by the New Zealand government as the reference topography data. In parallel, two more widely used machine learning models are also trained, in order to provide benchmarks against which the novel FCN may be assessed. We find that the FCN outperforms the other models (reducing root mean square error in the testing dataset by 71%), likely due to its ability to learn from spatial patterns at multiple scales, rather than only a pixel-by-pixel basis. Significantly for flood hazard modelling applications, corrections were found to be especially effective along rivers and their floodplains. However, our results also suggest that models are likely to be biased towards the land cover and relief conditions most prevalent in their training data, with further work required to assess the importance of limiting training data inputs to those most representative of the intended application area(s).


2020 ◽  
Vol 10 (10) ◽  
pp. 2446-2451
Author(s):  
Hussain Ahmad ◽  
Muhammad Zubair Asghar ◽  
Fahad M. Alotaibi ◽  
Ibrahim A. Hameed

In social media, depression identification could be regarded as a complex task because of the complicated nature associated with mental disorders. In recent times, there has been an evolution in this research area with growing popularity of social media platforms as these have become a fundamental part of people's day-to-day life. Social media platforms and their users share a close relationship due to which the users' personal life is reflected in these platforms on several levels. Apart from the associated complexity in recognising mental illnesses via social media platforms, implementing supervised machine learning approaches like deep neural networks is yet to be adopted in a large scale because of the inherent difficulties associated with procuring sufficient quantities of annotated training data. Because of such reasons, we have made effort to identify deep learning model that is most effective from amongst selected architectures with previous successful record in supervised learning methods. The selected model is employed to recognise online users that display depression; since there is limited unstructured text data that could be extracted from Twitter.


Author(s):  
Kimiaki Shirahama ◽  
Kuniaki Uehara

This paper examines video retrieval based on Query-By-Example (QBE) approach, where shots relevant to a query are retrieved from large-scale video data based on their similarity to example shots. This involves two crucial problems: The first is that similarity in features does not necessarily imply similarity in semantic content. The second problem is an expensive computational cost to compute the similarity of a huge number of shots to example shots. The authors have developed a method that can filter a large number of shots irrelevant to a query, based on a video ontology that is knowledge base about concepts displayed in a shot. The method utilizes various concept relationships (e.g., generalization/specialization, sibling, part-of, and co-occurrence) defined in the video ontology. In addition, although the video ontology assumes that shots are accurately annotated with concepts, accurate annotation is difficult due to the diversity of forms and appearances of the concepts. Dempster-Shafer theory is used to account the uncertainty in determining the relevance of a shot based on inaccurate annotation of this shot. Experimental results on TRECVID 2009 video data validate the effectiveness of the method.


Author(s):  
R. Roscher ◽  
M. Volpi ◽  
C. Mallet ◽  
L. Drees ◽  
J. D. Wegner

Abstract. In order to reach the goal of reliably solving Earth monitoring tasks, automated and efficient machine learning methods are necessary for large-scale scene analysis and interpretation. A typical bottleneck of supervised learning approaches is the availability of accurate (manually) labeled training data, which is particularly important to train state-of-the-art (deep) learning methods. We present SemCity Toulouse, a publicly available, very high resolution, multi-spectral benchmark data set for training and evaluation of sophisticated machine learning models. The benchmark acts as test bed for single building instance segmentation which has been rarely considered before in densely built urban areas. Additional information is provided in the form of a multi-class semantic segmentation annotation covering the same area plus an adjacent area 3 times larger. The data set addresses interested researchers from various communities such as photogrammetry and remote sensing, but also computer vision and machine learning.


Sign in / Sign up

Export Citation Format

Share Document