An Emotional Scene Retrieval Framework for Lifelog Videos Using Ensemble Clustering

Hiroki Nomiya; Atsushi Morikuni; Teruhisa Hochin

doi:10.4018/ijsi.2015070101

Unsupervised Emotional Scene Detection from Lifelog Videos Using Cluster Ensembles

International Journal of Software Innovation ◽

10.4018/ijsi.2013100101 ◽

2013 ◽

Vol 1 (4) ◽

pp. 1-15 ◽

Cited By ~ 2

Author(s):

Hiroki Nomiya ◽

Atsushi Morikuni ◽

Teruhisa Hochin

Keyword(s):

Facial Expression ◽

Facial Expressions ◽

Facial Expression Recognition ◽

Detection Method ◽

Video Retrieval ◽

Detection Performance ◽

Expression Recognition ◽

Scene Detection ◽

Cluster Ensembles ◽

Emotional Scenes

An emotional scene detection method is proposed in order to retrieve impressive scenes from lifelog videos. The proposed method is based on facial expression recognition considering that a wide variety of facial expression could be observed in impressive scenes. Conventional facial expression techniques, which focus on discriminating typical facial expressions, will be inadequate for lifelog video retrieval because of the diversity of facial expressions. The authors thus propose a more flexible and efficient emotional scene detection method using an unsupervised facial expression recognition based on cluster ensembles. The authors' approach does not need to predefine facial expressions and is able to detect emotional scenes containing a wide variety of facial expressions. The detection performance of the proposed method is evaluated through some emotional scene detection experiments.

Download Full-text

Large-Scale Video Retrieval via Deep Local Convolutional Features

Advances in Multimedia ◽

10.1155/2020/7862894 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Chen Zhang ◽

Bin Hu ◽

Yucong Suo ◽

Zhiqiang Zou ◽

Yimu Ji

Keyword(s):

Large Scale ◽

Video Retrieval ◽

Video Data ◽

Query Image ◽

Key Frame Extraction ◽

Key Frame ◽

Storage Cost ◽

Extraction Algorithm ◽

Feature Aggregation ◽

And Storage

In this paper, we study the challenge of image-to-video retrieval, which uses the query image to search relevant frames from a large collection of videos. A novel framework based on convolutional neural networks (CNNs) is proposed to perform large-scale video retrieval with low storage cost and high search efficiency. Our framework consists of the key-frame extraction algorithm and the feature aggregation strategy. Specifically, the key-frame extraction algorithm takes advantage of the clustering idea so that redundant information is removed in video data and storage cost is greatly reduced. The feature aggregation strategy adopts average pooling to encode deep local convolutional features followed by coarse-to-fine retrieval, which allows rapid retrieval in the large-scale video database. The results from extensive experiments on two publicly available datasets demonstrate that the proposed method achieves superior efficiency as well as accuracy over other state-of-the-art visual search methods.

Download Full-text

Constructing and Utilizing Video Ontology for Accurate and Fast Retrieval

International Journal of Multimedia Data Engineering and Management ◽

10.4018/jmdem.2011100104 ◽

2011 ◽

Vol 2 (4) ◽

pp. 59-75 ◽

Cited By ~ 1

Author(s):

Kimiaki Shirahama ◽

Kuniaki Uehara

Keyword(s):

Knowledge Base ◽

Large Scale ◽

Video Retrieval ◽

Computational Cost ◽

Semantic Content ◽

Video Data ◽

Experimental Results ◽

Huge Number ◽

Dempster Shafer Theory ◽

Shafer Theory

This paper examines video retrieval based on Query-By-Example (QBE) approach, where shots relevant to a query are retrieved from large-scale video data based on their similarity to example shots. This involves two crucial problems: The first is that similarity in features does not necessarily imply similarity in semantic content. The second problem is an expensive computational cost to compute the similarity of a huge number of shots to example shots. The authors have developed a method that can filter a large number of shots irrelevant to a query, based on a video ontology that is knowledge base about concepts displayed in a shot. The method utilizes various concept relationships (e.g., generalization/specialization, sibling, part-of, and co-occurrence) defined in the video ontology. In addition, although the video ontology assumes that shots are accurately annotated with concepts, accurate annotation is difficult due to the diversity of forms and appearances of the concepts. Dempster-Shafer theory is used to account the uncertainty in determining the relevance of a shot based on inaccurate annotation of this shot. Experimental results on TRECVID 2009 video data validate the effectiveness of the method.

Download Full-text

A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning

Database ◽

10.1093/database/baz116 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 2

Author(s):

Tao Chen ◽

Mingfen Wu ◽

Hexi Li

Keyword(s):

Deep Learning ◽

Large Scale ◽

Relation Extraction ◽

Training Model ◽

Biomedical Literature ◽

Training Data ◽

Fine Tuning ◽

Learning Approaches ◽

Additional Time ◽

Clinical Records

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.

Download Full-text

VIDEO RETRIEVAL VIA LEARNING COLLABORATIVE SEMANTIC DISTANCE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008944 ◽

2011 ◽

Vol 25 (04) ◽

pp. 475-490

Author(s):

SONGHAO ZHU ◽

ZHIWEI LIANG ◽

XIAOYUAN JING

Keyword(s):

Video Retrieval ◽

Temporal Correlation ◽

Video Data ◽

Video Annotation ◽

Superior Performance ◽

Learning Approaches ◽

Temporal Property ◽

Pairwise Similarity ◽

Learning Collaborative ◽

Data Set

Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled data in many real-world application areas, such as video annotation. However, the pairwise similarity metric, a significant factor of existing approaches, has not been fully investigated. That is, these graph-based semi-supervised approaches estimate the pairwise similarity between samples mainly according to the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pairwise similarity measure. Accordingly, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL), is proposed in this paper. This framework is characterized by simultaneously taking into account the spatial and temporal property of video data to achieve more accurate pairwise similarity values. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the benchmark TRECVID data set.

Download Full-text

Tree Kernels for Semantic Role Labeling

Computational Linguistics ◽

10.1162/coli.2008.34.2.193 ◽

2008 ◽

Vol 34 (2) ◽

pp. 193-224 ◽

Cited By ~ 58

Author(s):

Alessandro Moschitti ◽

Daniele Pighin ◽

Roberto Basili

Keyword(s):

Language Processing ◽

Large Scale ◽

Kernel Functions ◽

Feature Representation ◽

Training Data ◽

Support Vector ◽

Feature Engineering ◽

Semantic Role ◽

Learning Approaches ◽

Semantic Role Labeling

The availability of large scale data sets of manually annotated predicate-argument structures has recently favored the use of machine learning approaches to the design of automated semantic role labeling (SRL) systems. The main research in this area relates to the design choices for feature representation and for effective decompositions of the task in different learning models. Regarding the former choice, structural properties of full syntactic parses are largely employed as they represent ways to encode different principles suggested by the linking theory between syntax and semantics. The latter choice relates to several learning schemes over global views of the parses. For example, re-ranking stages operating over alternative predicate-argument sequences of the same sentence have shown to be very effective. In this article, we propose several kernel functions to model parse tree properties in kernel-based machines, for example, perceptrons or support vector machines. In particular, we define different kinds of tree kernels as general approaches to feature engineering in SRL. Moreover, we extensively experiment with such kernels to investigate their contribution to individual stages of an SRL architecture both in isolation and in combination with other traditional manually coded features. The results for boundary recognition, classification, and re-ranking stages provide systematic evidence about the significant impact of tree kernels on the overall accuracy, especially when the amount of training data is small. As a conclusive result, tree kernels allow for a general and easily portable feature engineering method which is applicable to a large family of natural language processing tasks.

Download Full-text

A Comparison of Machine Learning Approaches to Improve Free Topography Data for Flood Modelling

Remote Sensing ◽

10.3390/rs13020275 ◽

2021 ◽

Vol 13 (2) ◽

pp. 275

Author(s):

Michael Meadows ◽

Matthew Wilson

Keyword(s):

Neural Network ◽

Machine Learning ◽

Spatial Patterns ◽

Large Scale ◽

Multiple Scales ◽

Flood Hazard ◽

Training Data ◽

Learning Approaches ◽

Testing Dataset ◽

Topography Data

Given the high financial and institutional cost of collecting and processing accurate topography data, many large-scale flood hazard assessments continue to rely instead on freely-available global Digital Elevation Models, despite the significant vertical biases known to affect them. To predict (and thereby reduce) these biases, we apply a fully-convolutional neural network (FCN), a form of artificial neural network originally developed for image segmentation which is capable of learning from multi-variate spatial patterns at different scales. We assess its potential by training such a model on a wide variety of remote-sensed input data (primarily multi-spectral imagery), using high-resolution, LiDAR-derived Digital Terrain Models published by the New Zealand government as the reference topography data. In parallel, two more widely used machine learning models are also trained, in order to provide benchmarks against which the novel FCN may be assessed. We find that the FCN outperforms the other models (reducing root mean square error in the testing dataset by 71%), likely due to its ability to learn from spatial patterns at multiple scales, rather than only a pixel-by-pixel basis. Significantly for flood hazard modelling applications, corrections were found to be especially effective along rivers and their floodplains. However, our results also suggest that models are likely to be biased towards the land cover and relief conditions most prevalent in their training data, with further work required to assess the importance of limiting training data inputs to those most representative of the intended application area(s).

Download Full-text

Applying Deep Learning Technique for Depression Classification in Social Media Text

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3169 ◽

2020 ◽

Vol 10 (10) ◽

pp. 2446-2451

Author(s):

Hussain Ahmad ◽

Muhammad Zubair Asghar ◽

Fahad M. Alotaibi ◽

Ibrahim A. Hameed

Keyword(s):

Social Media ◽

Deep Learning ◽

Large Scale ◽

Research Area ◽

Mental Illnesses ◽

Training Data ◽

Supervised Machine Learning ◽

Learning Approaches ◽

Close Relationship ◽

Social Media Platforms

In social media, depression identification could be regarded as a complex task because of the complicated nature associated with mental disorders. In recent times, there has been an evolution in this research area with growing popularity of social media platforms as these have become a fundamental part of people's day-to-day life. Social media platforms and their users share a close relationship due to which the users' personal life is reflected in these platforms on several levels. Apart from the associated complexity in recognising mental illnesses via social media platforms, implementing supervised machine learning approaches like deep neural networks is yet to be adopted in a large scale because of the inherent difficulties associated with procuring sufficient quantities of annotated training data. Because of such reasons, we have made effort to identify deep learning model that is most effective from amongst selected architectures with previous successful record in supervised learning methods. The selected model is employed to recognise online users that display depression; since there is limited unstructured text data that could be extracted from Twitter.

Download Full-text

Constructing and Utilizing Video Ontology for Accurate and Fast Retrieval

Multimedia Data Engineering Applications and Processing ◽

10.4018/978-1-4666-2940-0.ch012 ◽

2013 ◽

pp. 226-242

Author(s):

Kimiaki Shirahama ◽

Kuniaki Uehara

Keyword(s):

Knowledge Base ◽

Large Scale ◽

Video Retrieval ◽

Computational Cost ◽

Semantic Content ◽

Video Data ◽

Experimental Results ◽

Query By Example ◽

Dempster Shafer Theory ◽

Shafer Theory

This paper examines video retrieval based on Query-By-Example (QBE) approach, where shots relevant to a query are retrieved from large-scale video data based on their similarity to example shots. This involves two crucial problems: The first is that similarity in features does not necessarily imply similarity in semantic content. The second problem is an expensive computational cost to compute the similarity of a huge number of shots to example shots. The authors have developed a method that can filter a large number of shots irrelevant to a query, based on a video ontology that is knowledge base about concepts displayed in a shot. The method utilizes various concept relationships (e.g., generalization/specialization, sibling, part-of, and co-occurrence) defined in the video ontology. In addition, although the video ontology assumes that shots are accurately annotated with concepts, accurate annotation is difficult due to the diversity of forms and appearances of the concepts. Dempster-Shafer theory is used to account the uncertainty in determining the relevance of a shot based on inaccurate annotation of this shot. Experimental results on TRECVID 2009 video data validate the effectiveness of the method.

Download Full-text

SEMCITY TOULOUSE: A BENCHMARK FOR BUILDING INSTANCE SEGMENTATION IN SATELLITE IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-5-2020-109-2020 ◽

2020 ◽

Vol V-5-2020 ◽

pp. 109-116

Author(s):

R. Roscher ◽

M. Volpi ◽

C. Mallet ◽

L. Drees ◽

J. D. Wegner

Keyword(s):

Machine Learning ◽

Urban Areas ◽

Large Scale ◽

Training Data ◽

Adjacent Area ◽

Learning Approaches ◽

Test Bed ◽

Learning Methods ◽

Data Set ◽

Instance Segmentation

Abstract. In order to reach the goal of reliably solving Earth monitoring tasks, automated and efficient machine learning methods are necessary for large-scale scene analysis and interpretation. A typical bottleneck of supervised learning approaches is the availability of accurate (manually) labeled training data, which is particularly important to train state-of-the-art (deep) learning methods. We present SemCity Toulouse, a publicly available, very high resolution, multi-spectral benchmark data set for training and evaluation of sophisticated machine learning models. The benchmark acts as test bed for single building instance segmentation which has been rarely considered before in densely built urban areas. Additional information is provided in the form of a multi-class semantic segmentation annotation covering the same area plus an adjacent area 3 times larger. The data set addresses interested researchers from various communities such as photogrammetry and remote sensing, but also computer vision and machine learning.

Download Full-text