An open stylometric system based on multilevel text analysis

Cognitive Studies | Études cognitives ◽

10.11649/cs.1430 ◽

2017 ◽

Cited By ~ 4

Author(s):

Maciej Eder ◽

Maciej Piasecki ◽

Tomasz Walkowiak

Keyword(s):

Social Sciences ◽

Text Analysis ◽

Semantic Analysis ◽

Deep Level ◽

General Idea ◽

Linguistic Features ◽

Text Documents ◽

Web Based ◽

Software Packages ◽

Functional Prototype

An open stylometric system based on multilevel text analysisStylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from different areas of the humanities and social sciences. This paper presents a general idea, followed by a fully functional prototype of an open stylometric system that facilitates its wide use through to two aspects: technical and research flexibility. The system relies on a server installation combined with a web-based user interface. This frees the user from the necessity of installing any additional software. At the same time, the system offers a variety of ways in which the input texts can be analysed: they include not only the usual lexical level, but also deep-level linguistic features. This enables a range of possible applications, from typical stylometric tasks to the semantic analysis of text documents. The internal architecture of the system relies on several well-known software packages: a collection of language tools (for text pre-processing), Stylo (for stylometric analysis) and Cluto (for text clustering). The paper presents: (1) The idea behind the system from the user’s perspective. (2) The architecture of the system, with a focus on data processing. (3) Features for text description. (4) The use of analytical systems such as Stylo and Cluto. The presentation is illustrated with example applications. Otwarty system stylometryczny wykorzystujący wielopoziomową analizę języka Zastosowania metod stylometrycznych na ogół ograniczają się do kilku typowych problemów badawczych, takich jak atrybucja autorska, styl gatunków literackich czy studia nad zróżnicowaniem stylistycznym kobiet i mężczyzn. Z pewnością dałoby się je z powodzeniem zastosować również do wielu innych problemów klasyfikacji tekstów, gdyby tylko owe metody oraz odpowiednie narzędzia były bardziej dostępne dla uczonych reprezentujących różne dyscypliny nauk humanistycznych i społecznych. Artykuł niniejszy omawia założenia teoretyczne oraz w pełni funkcjonalny prototyp otwartego systemu stylometrycznego, którego szerokie zastosowanie umożliwią dwie jego cechy: elastyczność techniczna oraz dostosowywalność do różnych pytań badawczych. System opiera się na instalacji serwerowej sprzęgniętej z sieciowym interfejsem użytkownika. Uwalnia to użytkownika od konieczności instalowania jakichkolwiek dodatkowych programów. Jednocześnie system oferuje wiele sposobów analizowania tekstów nie tylko na poziomie leksykalnym, lecz także poprzez cechy językowe niskiego poziomu. Daje to możliwość stosowania systemu na wiele różnych sposobów, od typowych testów stylometrycznych do analizy semantycznej dokumentów. Wewnętrzna architektura systemu składa się z wielu elementów znanych ze swej funkcjonalności, w tym z pakietu Stylo przeznaczonego do analiz stylometrycznych oraz pakietu Cluto służącego do zaawansowanej analizy skupień. Artykuł omawia: (1) Koncepcję całego systemu, postrzeganą z punktu widzenia użytkownika, (2) Architekturę systemu oraz jego elementy odpowiedzialne za przetwarzanie tekstu, (3) Cechy językowe służące do opisu dokumentów, (4) Zastosowanie modułów analizy danych, takich jak Stylo czy Cluto. W artykule zostały też przedstawione przykładowe zastosowania systemu.

Download Full-text

Using computerized text analysis to examine associations between linguistic features and clients’ distress during psychotherapy.

Journal of Counseling Psychology ◽

10.1037/cou0000440 ◽

2020 ◽

Cited By ~ 1

Author(s):

Natalie Shapira ◽

Gal Lazarus ◽

Yoav Goldberg ◽

Eva Gilboa-Schechtman ◽

Rivka Tuval-Mashiach ◽

...

Keyword(s):

Text Analysis ◽

Linguistic Features ◽

Computerized Text Analysis

Download Full-text

Representation and Presentation of Culinary Tradition as Cultural Heritage

Heritage ◽

10.3390/heritage4020036 ◽

2021 ◽

Vol 4 (2) ◽

pp. 612-640

Author(s):

Nikolaos Partarakis ◽

Danai Kaplanidi ◽

Paraskevi Doulgeraki ◽

Effie Karuzaki ◽

Argyro Petraki ◽

...

Keyword(s):

Social Sciences ◽

Knowledge Representation ◽

Cultural Heritage ◽

Systematic Approach ◽

Semantic Representation ◽

Ethnographic Research ◽

Abstract Form ◽

Web Based ◽

Social Sciences And Humanities ◽

Authoring Environment

This paper presents a knowledge representation framework and provides tools to allow the representation and presentation of the tangible and intangible dimensions of culinary tradition as cultural heritage including the socio-historic context of its evolution. The representation framework adheres to and extends the knowledge representation standards for the Cultural Heritage (CH) domain while providing a widely accessible web-based authoring environment to facilitate the representation activities. In strong collaboration with social sciences and humanities, this work allows the exploitation of ethnographic research outcomes by providing a systematic approach for the representation of culinary tradition in the form of recipes, both in an abstract form for their preservation and in a semantic representation of their execution captured on-site during ethnographic research.

Download Full-text

Text Analysis of Project Completion Reports

10.18235/0003611 ◽

2021 ◽

Author(s):

César E. Montiel Olea ◽

Leonardo R. Corral

Keyword(s):

Text Analysis ◽

Text Documents ◽

Project Completion ◽

Analysis Tools ◽

Development Effectiveness ◽

Different Types ◽

Potential Applications ◽

Project Cycle ◽

Main Instrument ◽

Unique Dataset

Project Completion Reports (PCRs) are the main instrument through which different multilateral organizations measure the success of a project once it closes. PCRs are important for development effectiveness as they serve to understand achievements, failures, and challenges within the project cycle they can feed back into the design and execution of new projects. The aim of this paper is to introduce text analysis tools for the exploration of PCR documents. We describe and apply different text analysis tools to explore the content of a sample of PCRs. We seek to illustrate a way in which PCRs can be summarized and analyzed using innovative tools applied to a unique dataset. We believe that the methods presented in this investigation have numerous potential applications to different types of text documents routinely prepared within the Inter-American Development Bank (IDB).

Download Full-text

Developing and Analyzing a Spanish Corpus for Forensic Purposes

Linguistic Evidence in Security Law and Intelligence ◽

10.5195/lesli.2019.19 ◽

2019 ◽

Vol 3 ◽

Author(s):

Ángela Almela ◽

Gema Alcaraz-Mármol ◽

Arancha García-Pinar ◽

Clara Pallejá

Keyword(s):

Data Collection ◽

Language Processing ◽

Ad Hoc ◽

Semantic Analysis ◽

Linguistic Features ◽

Natural Language Processing Tool ◽

Check Method ◽

Spanish Universities ◽

Gender Based ◽

Main Instrument

In this paper, the methods for developing a database of Spanish writing that can be used for forensic linguistic research are presented, including our data collection procedures. Specifically, the main instrument used for data collection has been translated into Spanish and adapted from Chaski (2001). It consists of ten tasks, by means of which the subjects are asked to write formal and informal texts about different topics. To date, 93 undergraduates from Spanish universities have already participated in the study and prisoners convicted of gender-based abuse have participated. A twofold analysis has been performed, since the data collected have been approached from a semantic and a morphosyntactic perspective. Regarding the semantic analysis, psycholinguistic categories have been used, many of them taken from the LIWC dictionary (Pennebaker et al., 2001). In order to obtain a more comprehensive depiction of the linguistic data, some other ad-hoc categories have been created, based on the corpus itself, using a double-check method for their validation so as to ensure inter-rater reliability. Furthermore, as regards morphosyntactic analysis, the natural language processing tool ALIAS TATTLER is being developed for Spanish. Results shows that is it possible to differentiate non-abusers from abusers with strong accuracy based on linguistic features.

Download Full-text

Emotional talk about robotic technologies on Reddit: Sentiment analysis of life domains, motives, and temporal themes

New Media & Society ◽

10.1177/14614448211067259 ◽

2021 ◽

pp. 146144482110672

Author(s):

Nina Savela ◽

David Garcia ◽

Max Pellert ◽

Atte Oksanen

Keyword(s):

Social Sciences ◽

Social Media ◽

Text Analysis ◽

Regression Models ◽

Life Domains ◽

Robotic Technology ◽

Logistic Regression Models ◽

Negative Comments ◽

Media Platform ◽

Robotic Technologies

This study grounded on computational social sciences and social psychology investigated sentiment and life domains, motivational, and temporal themes in social media discussions about robotic technologies. We retrieved text comments from the Reddit social media platform in March 2019 based on the following six robotic technology concepts: robot ( N = 3,433,554), AI ( N = 2,821,614), automation ( N = 879,092), bot ( N = 21,559,939), intelligent agent ( N = 15,119), and software agent ( N = 18,324). The comments were processed using VADER and LIWC text analysis tools and analyzed further with logistic regression models. Compared to the other four concepts, robot and AI were used less often in positive context. Comments addressing themes of leisure, money, and future were associated with positive and home, power, and past with negative comments. The results show how the context and terminology affect the emotionality in robotic technology conversations.

Download Full-text

Proposing a Semantic Analysis based Sanskrit Compiler by mapping Sanskrit's linguistic features with Compiler phases

10.1109/icesc51422.2021.9532969 ◽

2021 ◽

Author(s):

Akshay Chavan ◽

Pranathi Kunadi ◽

Nidhi Wader ◽

Shirish Sane

Keyword(s):

Semantic Analysis ◽

Linguistic Features

Download Full-text

Web-Based Semantic Analysis of Chinese News Video

Advances in Multimedia Information Processing - PCM 2006 - Lecture Notes in Computer Science ◽

10.1007/11922162_58 ◽

2006 ◽

pp. 502-509

Author(s):

Huamin Feng ◽

Zongqiang Pang ◽

Kun Qiu ◽

Guosen Song

Keyword(s):

Semantic Analysis ◽

Web Based ◽

News Video

Download Full-text

HD-eXplosion: visualization of hydrogen–deuterium exchange data as chiclet and volcano plots with statistical filtering

Bioinformatics ◽

10.1093/bioinformatics/btaa892 ◽

2020 ◽

Author(s):

Naifu Zhang ◽

Xiaohe Yu ◽

Xinchao Zhang ◽

Sheena D’Arcy

Keyword(s):

Source Code ◽

Deuterium Exchange ◽

Hydrogen Deuterium Exchange ◽

Web Based ◽

Data Presentation ◽

Software Packages ◽

Volcano Plots ◽

Hydrogen Deuterium ◽

Exchange Data ◽

Publication Quality

Abstract Summary Hydrogen–Deuterium eXchange coupled to mass spectrometry is a powerful tool for the analysis of protein dynamics and interactions. Bottom-up experiments looking at deuterium uptake differences between various conditions are the most common. These produce multi-dimensional data that can be challenging to depict in a single visual format. Each user must also set significance thresholds to define meaningful differences and make these apparent in data presentation. To assist in this process, we have created HD-eXplosion, an open-source, web-based application for the generation of chiclet and volcano plots with statistical filters. HD-eXplosion fills a void in available software packages and produces customizable plots that are publication quality. Availability and implementation The HD-eXplosion application is available at http://hd-explosion.utdallas.edu. The source code can be found at https://github.com/HD-Explosion.

Download Full-text

Integration of Web-Based Learning into Higher Education Institutions in Uganda

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.2018070103 ◽

2018 ◽

Vol 13 (3) ◽

pp. 33-50

Author(s):

Baguma Asuman ◽

Md. Shahadat Hossain Khan ◽

Che Kum Clement

Keyword(s):

Social Sciences ◽

Higher Education ◽

Teaching And Learning ◽

Educational Institutions ◽

Web Based ◽

Web Based Learning ◽

Learning Contexts ◽

New Knowledge ◽

Quantitative And Qualitative Methods ◽

Analyze Data

This article reports on the barriers encountered by teachers and the possible solutions to the integration of web-based learning (WBL) into higher educational institutions in Uganda. A total of 50 teachers in the departments of ICT, management, and social sciences from five different universities were purposively selected. A self-designed questionnaire was adapted to collect participants responses. Both quantitative and qualitative methods were used to analyze data. The findings indicate that teachers had a positive attitude to incorporate WBL into teaching and learning process, but they encountered some difficulties which were identified as slow internet speeds, insufficient web-based tools, lack of technical support, etc. It further identified possible enablers to overcome these difficulties and provides empirical evidence of incorporating new knowledge in the existing literature. It also provides recommendations in terms of overcoming difficulties to enhance and incorporate WBL in teaching and learning contexts of higher education in Uganda particularly and developing countries in general

Download Full-text

Deployment of a Web Based Critiquing System for Essay Writing in Hybrid Learning Environment

Handbook of Research on Hybrid Learning Models ◽

10.4018/978-1-60566-380-7.ch024 ◽

2010 ◽

pp. 393-405

Author(s):

Fion S.L. Lee ◽

Kelvin C.K. Wong ◽

William K.W. Cheung ◽

Cynthia F.K. Lee

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Hybrid Learning ◽

Writing Pedagogy ◽

Learning Approach ◽

School Students ◽

Web Based ◽

Essay Writing ◽

Feasibility Evaluation ◽

Hybrid Learning Environment

This chapter describes the use of a Web-based essay critiquing system and its integration into in a series of composition workshops for a group of secondary school students in Hong Kong. It begins with a review and application of the hybrid learning approach, followed by a description of latent semantic analysis, a methodology for corpus preparation. Then, the distribution computing architecture for essay critiquing system is described. It explicates the way in which the system is integrated with a writing pedagogy implemented in the workshop and the feasibility evaluation result is derived. The positive result confirms the benefits of hybrid learning.

Download Full-text