Creating and Exploring Semantic Annotation for Behaviour Analysis

Kristina Yordanova; Frank Krüger

doi:10.3390/s18092778

Creating and Exploring Semantic Annotation for Behaviour Analysis

Sensors ◽

10.3390/s18092778 ◽

2018 ◽

Vol 18 (9) ◽

pp. 2778 ◽

Cited By ~ 10

Author(s):

Kristina Yordanova ◽

Frank Krüger

Keyword(s):

Interrater Reliability ◽

Recognition Performance ◽

Semantic Annotation ◽

Ground Truth ◽

Training Data ◽

Grand Challenge ◽

Behaviour Analysis ◽

Knowledge Based ◽

Novel Approach

Providing ground truth is essential for activity recognition and behaviour analysis as it is needed for providing training data in methods of supervised learning, for providing context information for knowledge-based methods, and for quantifying the recognition performance. Semantic annotation extends simple symbolic labelling by assigning semantic meaning to the label, enabling further reasoning. In this paper, we present a novel approach to semantic annotation by means of plan operators. We provide a step by step description of the workflow to manually creating the ground truth annotation. To validate our approach, we create semantic annotation of the Carnegie Mellon University (CMU) grand challenge dataset, which is often cited, but, due to missing and incomplete annotation, almost never used. We show that it is possible to derive hidden properties, behavioural routines, and changes in initial and goal conditions in the annotated dataset. We evaluate the quality of the annotation by calculating the interrater reliability between two annotators who labelled the dataset. The results show very good overlapping (Cohen’s κ of 0.8) between the annotators. The produced annotation and the semantic models are publicly available, in order to enable further usage of the CMU grand challenge dataset.

Download Full-text

Tailored photometric stereo: Optimization of light source positions for various materials

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.6.iriacv-070 ◽

2020 ◽

Vol 2020 (6) ◽

pp. 71-1-71-7

Author(s):

Christian Kapeller ◽

Doris Antensteiner ◽

Svorad Štolc

Keyword(s):

Surface Defects ◽

Ground Truth ◽

Photometric Stereo ◽

Industrial Applications ◽

Training Data ◽

Light Sources ◽

Data Set ◽

Novel Approach ◽

Reconstruction Performance ◽

Public Data

Industrial machine vision applications frequently employ Photometric Stereo (PS) methods to detect fine surface defects on objects with challenging surface properties. To achieve highly precise results, acquisition setups with a vast amount of strobed illumination angles are required. The time-consuming nature of such an undertaking renders it inapt for most industrial applications. We overcome these limitations by carefully tailoring the required light setup to specific applications. Our novel approach facilitates the design of optimized acquisition setups for inline PS inspection systems. The optimal positions of light sources are derived from only a few representative material samples without the need for extensive amounts of training data. We formulate an energy function that constructs the illumination setup which generates the highest PS accuracy. The setup can be tailored for fast acquisition speed or cost efficiency. A thorough evaluation of the performance of our approach will be given on a public data set, evaluated by the mean angular error (MAE) for surface normals and root mean square (RMS) error for albedos. Our results show, that the obtained optimized PS setups can deliver a reconstruction performance close to the ground truth, while requiring only a few acquisitions.

Download Full-text

Crowdsourcing Image Analysis for Plant Phenomics to Generate Ground Truth Data for Machine Learning

10.1101/265918 ◽

2018 ◽

Author(s):

Naihui Zhou ◽

Zachary D Siegel ◽

Scott Zarecor ◽

Nigel Lee ◽

Darwin A Campbell ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Best Practices ◽

Ground Truth ◽

Training Data ◽

Quality Data ◽

High Quality ◽

Ground Truth Data ◽

Plant Phenomics

AbstractThe accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.Author SummaryFood security is a growing global concern. Farmers, plant breeders, and geneticists are hastening to address the challenges presented to agriculture by climate change, dwindling arable land, and population growth. Scientists in the field of plant phenomics are using satellite and drone images to understand how crops respond to a changing environment and to combine genetics and environmental measures to maximize crop growth efficiency. However, the terabytes of image data require new computational methods to extract useful information. Machine learning algorithms are effective in recognizing select parts of images, butthey require high quality data curated by people to train them, a process that can be laborious and costly. We examined how well crowdsourcing works in providing training data for plant phenomics, specifically, segmenting a corn tassel – the male flower of the corn plant – from the often-cluttered images of a cornfield. We provided images to students, and to Amazon MTurkers, the latter being an on-demand workforce brokered by Amazon.com and paid on a task-by-task basis. We report on best practices in crowdsourcing image labeling for phenomics, and compare the different groups on measures such as fatigue and accuracy over time. We find that crowdsourcing is a good way of generating quality labeled data, rivaling that of experts.

Download Full-text

Multi-Condition Training for Unknown Environment Adaptation in Robust ASR Under Real Conditions

Acta Polytechnica ◽

10.14311/1105 ◽

2009 ◽

Vol 49 (2) ◽

Author(s):

J. Rajnoha

Keyword(s):

Recognition Performance ◽

Training Data ◽

Noisy Environment ◽

Word Error Rate ◽

Training Material ◽

Initial Information ◽

Environment Adaptation ◽

Speech Data ◽

Adaptation Procedure

Automatic speech recognition (ASR) systems frequently work in a noisy environment. As they are often trained on clean speech data, noise reduction or adaptation techniques are applied to decrease the influence of background disturbance even in the case of unknown conditions. Speech data mixed with noise recordings from particular environment are often used for the purposes of model adaptation. This paper analyses the improvement of recognition performance within such adaptation when multi-condition training data from a real environment is used for training initial models. Although the quality of such models can decrease with the presence of noise in the training material, they are assumed to include initial information about noise and consequently support the adaptation procedure. Experimental results show significant improvement of the proposed training method in a robust ASR task under unknown noisy conditions. The decrease by 29 % and 14 % in word error rate in comparison with clean speech training data was achieved for the non-adapted and adapted system, respectively.

Download Full-text

“Garbage In, Garbage Out” Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?

Quantitative Science Studies ◽

10.1162/qss_a_00144 ◽

2021 ◽

pp. 1-32

Author(s):

R. Stuart Geiger ◽

Dominique Cope ◽

Jamie Ip ◽

Marsha Lotosh ◽

Aayush Shah ◽

...

Keyword(s):

Machine Learning ◽

Best Practices ◽

Ground Truth ◽

Training Data ◽

Supervised Machine Learning ◽

Social Media Platforms ◽

Learning Research ◽

Research And Education ◽

Application Fields

Abstract Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent ‘best practices’ around labeling training data were followed in applied ML publications within a single domain (social media platforms). In this paper, we expand by studying publications that apply supervised ML in a far broader spectrum of disciplines, focusing on human-labeled data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed, while acknowledging that a greater range of application fields necessarily produces greater diversity of labeling and annotation methods. Because much of machine learning research and education only focuses on what is done once a “ground truth” or “gold standard” of training data is available, it is especially relevant to discuss issues around the equally-important aspect of whether such data is reliable in the first place. This determination becomes increasingly complex when applied to a variety of specialized fields, as labeling can range from a task requiring little-to-no background knowledge to one that must be performed by someone with career expertise. Peer Review https://publons.com/publon/10.1162/qss_a_00144

Download Full-text

Automated Data Annotation for 6-DoF AI-Based Navigation Algorithm Development

Journal of Imaging ◽

10.3390/jimaging7110236 ◽

2021 ◽

Vol 7 (11) ◽

pp. 236

Author(s):

Javier Gibran Apud Baca ◽

Thomas Jantos ◽

Mario Theuermann ◽

Mohamed Amin Hamdad ◽

Jan Steinbrener ◽

...

Keyword(s):

Ground Truth ◽

Autonomous Driving ◽

3D Models ◽

Unmanned Aircraft ◽

Volume Estimation ◽

Data Annotation ◽

Navigation Algorithm ◽

Novel Approach ◽

Six Degree Of Freedom

Accurately estimating the six degree of freedom (6-DoF) pose of objects in images is essential for a variety of applications such as robotics, autonomous driving, and autonomous, AI, and vision-based navigation for unmanned aircraft systems (UAS). Developing such algorithms requires large datasets; however, generating those is tedious as it requires annotating the 6-DoF relative pose of each object of interest present in the image w.r.t. to the camera. Therefore, this work presents a novel approach that automates the data acquisition and annotation process and thus minimizes the annotation effort to the duration of the recording. To maximize the quality of the resulting annotations, we employ an optimization-based approach for determining the extrinsic calibration parameters of the camera. Our approach can handle multiple objects in the scene, automatically providing ground-truth labeling for each object and taking into account occlusion effects between different objects. Moreover, our approach can not only be used to generate data for 6-DoF pose estimation and corresponding 3D-models but can be also extended to automatic dataset generation for object detection, instance segmentation, or volume estimation for any kind of object.

Download Full-text

Evaluasi Kombinasi Hipernin dan Sinonim untuk Klasifikasi Kebutuhan Non-Functional Berbasis ISO/IEC 25010

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2019651422 ◽

2019 ◽

Vol 6 (5) ◽

pp. 491

Author(s):

Lukman Hakim ◽

Siti Rochimah ◽

Chastine Fatichah

Keyword(s):

Software Development ◽

Ground Truth ◽

Quality Standard ◽

Training Data ◽

Test Results ◽

Software Development Process ◽

Functional Requirements ◽

Quality Aspects ◽

The Hierarchical Structure

Kebutuhan non-fungsional dianggap mampu mendukung keberhasilan pengembangan perangkat lunak. Namun, kebutuhan non-fungsional sering diabaikan selama proses pengembangan perangkat lunak. Hal ini dikarenakan kebutuhan non-fungsional sering tercampur dengan kebutuhan fungsional. Disamping itu, standar kualitas yang beragam menyebabkan kebingungan dalam menentukan aspek kualitas. Pendekataan yang ada menggunakan ISO/IEC 9126 sebagai referensi untuk mengukur aspek kualitas. ISO/IEC 9126 merupakan standar lama yang dirilis pada tahun 2001. Peneliti sebelumnya mengungkapkan ambiguitas dalam enam sub-atribut pada struktur hirarkis ISO/IEC 9126. Hal ini menimbulkan keraguan serius tentang validitas standar secara keseluruhan. Oleh karena itu, standar kualitas yang digunakan sebagai referensi untuk mengukur aspek kualitas pada penelitian ini adalah ISO/IEC 25010. Selain itu, penelitian ini juga mengusulkan suatu sistem untuk mengidentifikasi aspek kualitas kebutuhan non-fungsional dengan menggunakan 1 level hipernim dan 20 sinonim yang disebut skenario 1. Skenario ini akan dibandingkan dengan 2 level hipernim dan 9 sinonim pada masing-masing sinonim yang disebut skenario 2. Kedua skenario tersebut akan menghasilkan dua data latih berbeda. Kedua data latih tersebut akan dibandingkan menggunakan dua model pengujian yaitu berdasarkan ground truth pakar dan sistem dengan menggunakan metode klasifikasi KNN dan SVM. Hasil pengujian menunjukkan skenario 1 terbukti memberikan nilai lebih baik dibandingkan skenario 2 pada kedua model pengujian, dimana nilai precision dari ground truth pakar, KNN, dan SVM masing-masing 49.3%, 81.0%, dan 74.6%. Abstract Non-Functional requirements are considered capable of supporting the success of software development. However, non-functional requirements are often ignored during the software development process. This is because the quality aspects of non-functional requirements are often mixed with functional requirements. in addition, the number of diverse quality standards causes confusion in determining quality aspects. The existing approach uses ISO / IEC 9126 as a reference to measure quality aspects. ISO / IEC 9126 is an old standard released in 2001. Previous researchers revealed ambiguity in six sub-attributes on the hierarchical structure of ISO / IEC 9126. This raises serious doubts about the validity of the overall standard. Therefore, the quality standard used as a reference to measure the quality aspects of this study is ISO / IEC 25010. In addition, this study also proposes a system to identify aspects of the quality of non-functional requirements using 1 hypernym level and 20 synonyms called scenario 1. This scenario will be compared with 2 hypernym levels and 9 synonyms in each synonym called scenario 2. Both scenarios will produce two different training data. The two training data will be compared using two testing models ie based on expert ground truth and systems using the KNN and SVM classification methods. The test results showed scenario 1 is proven to provide a better value than scenario 2 in both testing models, where the precision values of expert ground truth, KNN, and SVM respectively 49.3%, 81.0%, and 74.6%.

Download Full-text

Automatically evaluating the quality of textual descriptions in cultural heritage records

International Journal on Digital Libraries ◽

10.1007/s00799-021-00302-1 ◽

2021 ◽

Author(s):

Matteo Lorenzini ◽

Marco Rospocher ◽

Sara Tonelli

Keyword(s):

Machine Learning ◽

Cultural Heritage ◽

Digital Library ◽

Text Classification ◽

Digital Libraries ◽

Quality Standards ◽

Classification Performance ◽

Training Data ◽

Novel Approach

AbstractMetadata are fundamental for the indexing, browsing and retrieval of cultural heritage resources in repositories, digital libraries and catalogues. In order to be effectively exploited, metadata information has to meet some quality standards, typically defined in the collection usage guidelines. As manually checking the quality of metadata in a repository may not be affordable, especially in large collections, in this paper we specifically address the problem of automatically assessing the quality of metadata, focusing in particular on textual descriptions of cultural heritage items. We describe a novel approach based on machine learning that tackles this problem by framing it as a binary text classification task aimed at evaluating the accuracy of textual descriptions. We report our assessment of different classifiers using a new dataset that we developed, containing more than 100K descriptions. The dataset was extracted from different collections and domains from the Italian digital library “Cultura Italia” and was annotated with accuracy information in terms of compliance with the cataloguing guidelines. The results empirically confirm that our proposed approach can effectively support curators (F1 $$\sim $$ ∼ 0.85) in assessing the quality of the textual descriptions of the records in their collections and provide some insights into how training data, specifically their size and domain, can affect classification performance.

Download Full-text

Interrater reliability of a method to assess hypothalamic involvement in pediatric adamantinomatous craniopharyngioma

Journal of Neurosurgery Pediatrics ◽

10.3171/2019.8.peds19295 ◽

2020 ◽

Vol 25 (1) ◽

pp. 37-42 ◽

Cited By ~ 1

Author(s):

Ros Whelan ◽

Eric Prince ◽

David M. Mirsky ◽

Robert Naftel ◽

Aashim Bhatia ◽

...

Keyword(s):

Quality Of Life ◽

Brain Tumors ◽

Interrater Reliability ◽

Statistical Evaluation ◽

Grading System ◽

Reliability Study ◽

Postoperative Mri ◽

Mri Scans ◽

Adamantinomatous Craniopharyngioma

OBJECTIVEPediatric adamantinomatous craniopharyngiomas (ACPs) are histologically benign brain tumors that confer significant neuroendocrine morbidity. Previous studies have demonstrated that injury to the hypothalamus is associated with worsened quality of life and a shorter lifespan. This insight helps many surgeons define the goals of surgery for patients with ACP. Puget and colleagues proposed a 3-tiered preoperative and postoperative grading system based on the degree of hypothalamic involvement identified on MRI. In a prospective cohort from their institution, the authors found that use of the system to guide operative goals was associated with decreased morbidity. To date, however, the Puget system has not been externally validated. Here, the authors present an interrater reliability study that assesses the generalizability of this system for surgeons planning initial operative intervention for children with craniopharyngiomas.METHODSA panel of 6 experts, consisting of pediatric neurosurgeons and pediatric neuroradiologists, graded 30 preoperative and postoperative MRI scans according to the Puget system. Interrater reliability was calculated using Fleiss’ κ and Krippendorff’s α statistics.RESULTSInterrater reliability in the preoperative context demonstrated moderate agreement (κ = 0.50, α = 0.51). Interrater reliability in the postoperative context was 0.27 for both methods of statistical evaluation.CONCLUSIONSInterrater reliability for the system as defined is moderate. Slight refinements of the Puget MRI grading system, such as collapsing the 3 grades into 2, may improve its reliability, making the system more generalizable.

Download Full-text

Model and Method for Contributor’s Quality Assessment in Community Image Tagging Systems

Information and Control Systems ◽

10.31799/1684-8853-2018-4-45-51 ◽

2018 ◽

pp. 45-51

Author(s):

A. V. Ponomarev

Keyword(s):

Large Scale ◽

Wide Spectrum ◽

Preference Relation ◽

Pairwise Comparison ◽

Ground Truth ◽

Comparison Method ◽

Characteristic Matrix ◽

Image Tagging ◽

Proposed Model

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems).

Download Full-text

One Song, Many Voices: Dementia and The Power of Music

Innovation in Aging ◽

10.1093/geroni/igaa057.2802 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 775-775

Author(s):

Debra Sheets ◽

Stuart MacDonald ◽

Andre Smith

Keyword(s):

High School Students ◽

Social Inclusion ◽

Positive Impact ◽

Social Connections ◽

Social Connection ◽

School Students ◽

Choral Singing ◽

Novel Approach ◽

Care Partners

Abstract Choral singing is a novel approach to reduce dementia stigma and social isolation while offering participants a sense of purpose, joy and social connection. The pervasiveness of stigma surrounding dementia remains one of the biggest barriers to living life with dignity following a diagnosis (Alzheimer Society of Canada, 2018). This paper examines how a social inclusion model of dementia care involving an intergenerational choir for people living with dementia, their care partners and high school students can reduce stigma and foster social connections. Multiple methodologies are used to investigate the effects of choir participation on cognition, stress levels, social connections, stigma, and quality of life. Results demonstrate the positive impact of choir participation and indicate that this socially inclusive intervention offers an effective, non-pharmacological alternative for older adults living with dementia in the community. Discussion focuses on the importance of instituting meaningful and engaging dementia-friendly activities at the community level.

Download Full-text