A Domain-Specific Generative Chatbot Trained from Little Data

Jurgita Kapočiūtė-Dzikienė

doi:10.3390/app10072221

A Domain-Specific Generative Chatbot Trained from Little Data

Applied Sciences ◽

10.3390/app10072221 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2221 ◽

Cited By ~ 5

Author(s):

Jurgita Kapočiūtė-Dzikienė

Keyword(s):

Experimental Investigation ◽

English Language ◽

Large Datasets ◽

Word Embedding ◽

Small Data ◽

Domain Specific ◽

Small Domain ◽

Good For ◽

Different Characteristics

Accurate generative chatbots are usually trained on large datasets of question–answer pairs. Despite such datasets not existing for some languages, it does not reduce the need for companies to have chatbot technology in their websites. However, companies usually own small domain-specific datasets (at least in the form of an FAQ) about their products, services, or used technologies. In this research, we seek effective solutions to create generative seq2seq-based chatbots from very small data. Since experiments are carried out in English and morphologically complex Lithuanian languages, we have an opportunity to compare results for languages with very different characteristics. We experimentally explore three encoder–decoder LSTM-based approaches (simple LSTM, stacked LSTM, and BiLSTM), three word embedding types (one-hot encoding, fastText, and BERT embeddings), and five encoder–decoder architectures based on different encoder and decoder vectorization units. Furthermore, all offered approaches are applied to the pre-processed datasets with removed and separated punctuation. The experimental investigation revealed the advantages of the stacked LSTM and BiLSTM encoder architectures and BERT embedding vectorization (especially for the encoder). The best achieved BLUE on English/Lithuanian datasets with removed and separated punctuation was ~0.513/~0.505 and ~0.488/~0.439, respectively. Better results were achieved with the English language, because generating different inflection forms for the morphologically complex Lithuanian is a harder task. The BLUE scores fell into the range defining the quality of the generated answers as good or very good for both languages. This research was performed with very small datasets having little variety in covered topics, which makes this research not only more difficult, but also more interesting. Moreover, to our knowledge, it is the first attempt to train generative chatbots for a morphologically complex language.

Download Full-text

A Quality Assessment Framework for Large Datasets of Container-Trips Information

Computer Information Systems and Industrial Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-45378-1_63 ◽

2016 ◽

pp. 729-740 ◽

Cited By ~ 1

Author(s):

Michail Makridis ◽

Raúl Fidalgo-Merino ◽

José-Antonio Cotelo-Lema ◽

Aris Tsois ◽

Enrico Checchi

Keyword(s):

Quality Assessment ◽

Large Datasets ◽

Data Sources ◽

Assessment Framework ◽

Domain Specific ◽

Weak Points ◽

Assessment Procedures ◽

It Risk ◽

Selection Of

Abstract Customs worldwide are facing the challenge of supervising huge volumes of containerized trade arriving to their country with resources allowing them to inspect only a minimal fraction of it. Risk assessment procedures can support them on the selection of the containers to inspect. The Container-Trip information (CTI) is an important element for that evaluation, but is usually not available with the needed quality. Therefore, the quality of the computed CTI records from any data sources that may use (e.g. Container Status Messages), needs to be assessed. This paper presents a quality assessment framework that combines quantitative and qualitative domain specific metrics to evaluate the quality of large datasets of CTI records and to provide a more complete feedback on which aspects need to be revised to improve the quality of the output data. The experimental results show the robustness of the framework in highlighting the weak points on the datasets and in identifying efficiently cases of potentially wrong CTI records.

Download Full-text

Food procurement in English-language Canadian public schools: Opportunities and challenges

Canadian Food Studies / La Revue canadienne des études sur l alimentation ◽

10.15353/cfs-rcea.v6i1.265 ◽

2019 ◽

Vol 6 (1) ◽

pp. 75-99

Author(s):

Shawna Holmes

Keyword(s):

Public Schools ◽

Food Environment ◽

Nutritional Quality ◽

English Language ◽

Nutrient Content ◽

Food Environments ◽

School Food ◽

Provincial Level ◽

School Food Environment

This paper examines the changes to procurement for school food environments in Canada as a response to changes to nutrition regulations at the provincial level. Interviews with those working in school food environments across Canada revealed how changes to the nutrition requirements of foods and beverages sold in schools presented opportunities to not only improve the nutrient content of the items made available in school food environments, but also to include local producers and/or school gardens in procuring for the school food environment. At the same time, some schools struggle to procure nutritionally compliant foods due to increased costs associated with transporting produce to rural, remote, or northern communities as well as logistic difficulties like spoilage. Although the nutrition regulations have facilitated improvements to food environments in some schools, others require more support to improve the overall nutritional quality of the foods and beverages available to students at school.

Download Full-text

Productive Vocabulary Knowledge of ESL Learners

Asian Journal of Interdisciplinary Research ◽

10.34256/ajir1814 ◽

2018 ◽

Vol 1 (1) ◽

pp. 32-41 ◽

Cited By ~ 1

Author(s):

Abdulmalik Usman ◽

Dahiru Musa Abdullahi

Keyword(s):

Teaching And Learning ◽

English Language ◽

Vocabulary Knowledge ◽

Writing Quality ◽

Lexical Frequency ◽

Frequency Profile ◽

Esl Learners ◽

The Relationship

The paper seeks to investigate the level of productive knowledge of ESL learners, the writing quality and the relationship between the vocabulary knowledge and the writing quality. 150 final year students of English language in a university in Nigeria were randomly selected as respondents. The respondents were asked to write an essay of 300 words within one hour. The essays were typed into Vocab Profiler of Cobb (2002) and analyzed the Lexical Frequency Profile of the respondents. The essays were also assessed by independent examiners using a standard rubric. The findings reveal that the level of productive vocabulary knowledge of the respondents is limited. The writing quality of the majority of the respondent is fair and there is a significant correlation between vocabulary and the witting quality of the subjects. The researchers posit that productive vocabulary is the predictor of writing quality and recommend various techniques through which teaching and learning of vocabulary can be improved.

Download Full-text

“The Past, Present, and Future of Accounting History”: A Comment on the State of Accounting History

Accounting Historians Journal ◽

10.2308/aahj-2020-002 ◽

2020 ◽

Vol 47 (1) ◽

pp. 89-95 ◽

Cited By ~ 1

Author(s):

Garry D. Carnegie

Keyword(s):

English Language ◽

Research Field ◽

The State ◽

International Accounting ◽

Scholarly Research ◽

Recent Contribution ◽

Global Society ◽

Accounting History ◽

The Past

ABSTRACT This response to the recent contribution by Matthews (2019) entitled “The Past, Present, and Future of Accounting History” specifically deals with the issues associated with concentrating on counting publication numbers in examining the state of a scholarly research field at the start of the 2020s. It outlines several pitfalls with the narrowly focused publications count analysis, in selected English language journals only, as provided by Matthews. The commentary is based on three key arguments: (1) accounting history research and publication is far more than a “numbers game”; (2) trends in the quality of the research undertaken and published are paramount; and (3) international publication and accumulated knowledge in accounting history are indeed more than a collection of English language publications. The author seeks to contribute to discussion and debate between accounting historians and other researchers for the benefit and development of the international accounting history community and global society.

Download Full-text

Medical and surgical interventions to improve the quality of life for endometriosis patients: a systematic review

Gynecological Surgery ◽

10.1186/s10397-021-01096-5 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Maurizio Nicola D’Alterio ◽

Stefania Saponara ◽

Mirian Agus ◽

Antonio Simone Laganà ◽

Marco Noventa ◽

...

Keyword(s):

Quality Of Life ◽

English Language ◽

Short Form ◽

Work And Family ◽

Surgical Interventions ◽

Medline Search ◽

Accessible Information ◽

Before And After ◽

Surgical Treatments

AbstractEndometriosis impairs the quality of life (QoL) of many women, including their social relationships, daily activity, productivity at work, and family planning. The aim of this review was to determine the instruments used to examine QoL in previous clinical studies of endometriosis and to evaluate the effect of medical and surgical interventions for endometriosis on QoL. We conducted a systematic search and review of studies published between January 2010 and December 2020 using MEDLINE. Search terms included “endometriosis” and “quality of life.” We only selected studies that used a standardized questionnaire to evaluate QoL before and after medical or surgical interventions. Only articles in the English language were examined. The initial search identified 720 results. After excluding duplicates and applying inclusion criteria, 37 studies were selected for analysis. We found that the two scales most frequently used to measure QoL were the Short Form-36 health survey questionnaire (SF-36) and the Endometriosis Health Profile-30 (EHP-30). Many medical and surgical treatments demonstrated comparable benefits in pain control and QoL improvement. There is no clear answer as to what is the best treatment for improving QoL because each therapy must be personalized for the patient and depends on the woman’s goals. In conclusion, women must be informed about endometriosis and given easily accessible information to improve treatment adherence and their QoL.

Download Full-text

Development and assessment of a telesonography system for musculoskeletal imaging

European Radiology Experimental ◽

10.1186/s41747-021-00227-z ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Mohammed Obaid ◽

Qianwei Zhang ◽

Scott J. Adams ◽

Reza Fotouhi ◽

Haron Obaid

Keyword(s):

Degrees Of Freedom ◽

Ultrasound Images ◽

Ultrasound Probe ◽

Rural And Remote ◽

Significant Delay ◽

Remote Communities ◽

Internet Connection ◽

Good For ◽

4 Degrees Of Freedom

Abstract Background Telesonography systems have been developed to overcome barriers to accessing diagnostic ultrasound for patients in rural and remote communities. However, most previous telesonography systems have been designed for performing only abdominal and obstetrical exams. In this paper, we describe the development and assessment of a musculoskeletal (MSK) telesonography system. Methods We developed a 4-degrees-of-freedom (DOF) robot to manipulate an ultrasound probe. The robot was remotely controlled by a radiologist operating a joystick at the master site. The telesonography system was used to scan participants’ forearms, and all participants were conventionally scanned for comparison. Participants and radiologists were surveyed regarding their experience. Images from both scanning methods were independently assessed by an MSK radiologist. Results All ten ultrasound exams were successfully performed using our developed MSK telesonography system, with no significant delay in movement. The duration (mean ± standard deviation) of telerobotic and conventional exams was 4.6 ± 0.9 and 1.4 ± 0.5 min, respectively (p = 0.039). An MSK radiologist rated quality of real-time ultrasound images transmitted over an internet connection as “very good” for all telesonography exams, and participants rated communication with the radiologist as “very good” or “good” for all exams. Visualisation of anatomic structures was similar between telerobotic and conventional methods, with no statistically significant differences. Conclusions The MSK telesonography system developed in this study is feasible for performing soft tissue ultrasound exams. The advancement of this system may allow MSK ultrasound exams to be performed over long distances, increasing access to ultrasound for patients in rural and remote communities.

Download Full-text

Playing to the Gallery: Emotive Rhetoric in Parliaments

American Political Science Review ◽

10.1017/s0003055421000356 ◽

2021 ◽

pp. 1-15

Author(s):

MORITZ OSNABRÜGGE ◽

SARA B. HOBOLT ◽

TONI RODON

Keyword(s):

Political Representation ◽

Word Embedding ◽

House Of Commons ◽

General Audience ◽

Domain Specific ◽

High Profile ◽

Affective Norms ◽

The Uk

Research has shown that emotions matter in politics, but we know less about when and why politicians use emotive rhetoric in the legislative arena. This article argues that emotive rhetoric is one of the tools politicians can use strategically to appeal to voters. Consequently, we expect that legislators are more likely to use emotive rhetoric in debates that have a large general audience. Our analysis covers two million parliamentary speeches held in the UK House of Commons and the Irish Parliament. We use a dictionary-based method to measure emotive rhetoric, combining the Affective Norms for English Words dictionary with word-embedding techniques to create a domain-specific dictionary. We show that emotive rhetoric is more pronounced in high-profile legislative debates, such as Prime Minister’s Questions. These findings contribute to the study of legislative speech and political representation by suggesting that emotive rhetoric is used by legislators to appeal directly to voters.

Download Full-text

Producing ‘good enough’ automated transcripts securely: Extending Bokhove and Downey (2018) to address security concerns

Methodological Innovations ◽

10.1177/2059799120987766 ◽

2021 ◽

Vol 14 (1) ◽

pp. 205979912098776

Author(s):

Joseph Da Silva

Keyword(s):

Cyber Security ◽

English Language ◽

Negative Impact ◽

Early Career ◽

Third Party ◽

Doctoral Research ◽

Computing Services ◽

Cloud Computing Services ◽

Audio Data

Interviews are an established research method across multiple disciplines. Such interviews are typically transcribed orthographically in order to facilitate analysis. Many novice qualitative researchers’ experiences of manual transcription are that it is tedious and time-consuming, although it is generally accepted within much of the literature that quality of analysis is improved through researchers performing this task themselves. This is despite the potential for the exhausting nature of bulk transcription to conversely have a negative impact upon quality. Other researchers have explored the use of automated methods to ease the task of transcription, more recently using cloud-computing services, but such services present challenges to ensuring confidentiality and privacy of data. In the field of cyber-security, these are particularly concerning; however, any researcher dealing with confidential participant speech should also be uneasy with third-party access to such data. As a result, researchers, particularly early-career researchers and students, may find themselves with no option other than manual transcription. This article presents a secure and effective alternative, building on prior work published in this journal, to present a method that significantly reduced, by more than half, interview transcription time for the researcher yet maintained security of audio data. It presents a comparison between this method and a fully manual method, drawing on data from 10 interviews conducted as part of my doctoral research. The method presented requires an investment in specific equipment which currently only supports the English language.

Download Full-text

SAT0371 ARE ENGLISH-LANGUAGE VIDEOS ON YOUTUBE A USEFUL SOURCE OF INFORMATION FOR SPONDYLOARTHRITIS?

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3109 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1133.1-1133

Author(s):

S. Elangovan ◽

Y. H. Kwan ◽

W. Fong

Keyword(s):

English Language ◽

Healthcare Professionals ◽

Treatment Options ◽

Misleading Information ◽

Information Group ◽

Youtube Videos ◽

Group 2 ◽

Patient Opinion ◽

Group 1

Background:Spondyloarthritis (SpA) is a family of chronic inflammatory disorders. Social media, such as YouTube, is a popular online platform where patients often visit for information. However, the validity of the content uploaded onto YouTube is not known.Objectives:This study aimed to evaluate the content, reliability and quality of the most viewed English-language YouTube videos on SpA.Methods:Keywords “spondyloarthritis”, “spondyloarthropathy” and “ankylosing spondylitis” were searched on YouTube on October 7th, 2019. The top 270 videos were screened. Videos were excluded if they were irrelevant, in non-English language or if they had no audio. Total number of views, duration on YouTube (days), video length, upload date, number of likes, dislikes, subscribers and comments were recorded for videos. A modified 5-point DISCERN tool1and the 5-point Global Quality Scale (GQS) score2were used to assess the reliability and quality of the videos, with higher scores indicating greater reliability and quality respectively.Results:Two hundred of 270 videos were included in the final analysis [61.5% from healthcare professionals, 37.0% from patients, 1.5% from news channels]. Of the 200 videos, 15 were uploaded within the last year and 112 in the last five years. 120 (60%) were categorized as useful information (Group 1), 6 (3%) as misleading information (Group 2), 52 (26%) as useful patient opinion (Group 3) and 22 (11%) as misleading patient opinion (Group 4). Useful videos were mainly from healthcare professionals or patients (86%). Useful videos (Group 1 and 3) had higher median (IQR) number of subscribers [2700 (14700) vs 211 (457), p < 0.01], reliability scores [3 (1) vs 2 (1), p < 0.01] and GQS scores [3 (1) vs. 2 (1), p < 0.001] compared to misleading videos (Group 2 and 4), respectively.Videos uploaded by healthcare professionals tended to have more useful information [94% (116 of 123) vs. 66% (49 of 74), p < 0.001] and had higher median (IQR) reliability scores [3 (1) vs 2 (1), p < 0.001] and GQS scores [3 (2) vs 2 (1), p < 0.001] compared to patient uploaded videos respectively. Of the 5 (out of 123) videos from healthcare professionals that had misleading information, it was because of outdated information on diagnosis (3 videos) and treatment (5 videos) of SpA. Of the 22 videos that had misleading patient opinion, 9 (41%) wrongly described the clinical features for SpA and 14 (64%) portrayed the current evidence based treatment options as ineffective and described alternative treatment plans (i.e. diet restrictions, complementary and alternative medicine).Conclusion:The majority of English language YouTube videos have useful information on the topic of SpA, however, 31% of patient opinions have inaccurate information on the clinical features and treatment options, and viewers need to be cognisant of these “fake news”.References:[1]Charnock D, Shepperd S, Needham G, Gann R (1999) DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 53(2): 105-111[2]Bernard A, Langille M, Hughes S, Rose C, Leddin D, Veldhuyzen van Zanten S (2007) A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 102(9):2070-2077Disclosure of Interests:Sakktivel Elangovan: None declared, Yu Heng Kwan: None declared, Warren Fong Consultant of: Abbvie, Janssen, Novartis, Speakers bureau: Abbvie, Janssen, Novartis

Download Full-text

Anorectal Dysfunction in Multiple Sclerosis: A Systematic Review

ISRN Neurology ◽

10.5402/2012/376023 ◽

2012 ◽

Vol 2012 ◽

pp. 1-9 ◽

Cited By ~ 22

Author(s):

Sanober Nusrat ◽

Elsie Gulick ◽

David Levinthal ◽

Klaus Bielefeldt

Keyword(s):

Quality Of Life ◽

Multiple Sclerosis ◽

Fecal Incontinence ◽

English Language ◽

Controlled Trial ◽

Neuromuscular Diseases ◽

Anorectal Dysfunction ◽

Pubmed Database ◽

Marginal Improvement

Constipation and fecal incontinence are common in patients with neuromuscular diseases. Despite their high prevalence and potential impact on overall quality of life, few studies have addressed anorectal dysfunction in patients with multiple sclerosis (MS). The goal of this paper is to define the prevalence, pathophysiology, impact, and potential treatment of constipation and incontinence in MS patients. Methods. The PubMed database was searched for English language publications between January 1973 and December 2011. Articles were reviewed to assess the definition of the study population, duration, type and severity of MS, sex distribution, prevalence, impact, results of physiologic testing, and treatments. Results. The reported prevalence of constipation and fecal incontinence ranged around 40%. Anorectal dysfunction significantly affected patients with nearly 1 in 6 patients limiting social activities or even quitting work due to symptoms. Caregivers listed toileting as a common and significant burden. The only randomized controlled trial showed a marginal improvement of constipation with abdominal massage. All other reports lacked control interventions and only demonstrated improvement in individuals with milder symptoms. Conclusion. Anorectal dysfunction is a common manifestation in MS that significantly affects quality of life. Therapies are at best moderately effective and often cumbersome, highlighting the need for simple and more helpful interventions.

Download Full-text