scholarly journals recommendation to the SSH community: Take a linguist on board

2021 ◽  
Vol 45 (1) ◽  
Author(s):  
Jeannine Beeken

In this paper we address how Natural Language Processing (NLP) approaches and language technology can contribute to data services in different ways; from providing social science users with new approaches and tools to explore oral and textual data, to enhancing the search, findability and retrieval of data sources. By using linguistic approaches we are able to process data, for example using Automated Speech Recognition (ASR) and named entity recognizers (NER), extract key concepts and terms, and improve search strategies.  We provide examples of how computational linguistics contribute to and facilitate the mining and analysis of oral or textual material, for example (transcribed) interviews or oral histories, and show how free open source (OS) tools can be used very easily to gain a quick overview of the key features of text, which can be further exploited as useful metadata.

Author(s):  
Ayush Srivastav ◽  
Hera Khan ◽  
Amit Kumar Mishra

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.


2020 ◽  
Vol 46 (1) ◽  
pp. 1-10
Author(s):  
Dhafar Hamed Abd ◽  
Ahmed T. Sadiq ◽  
Ayad R. Abbas

Now day’s text Classification and Sentiment analysis is considered as one of the popular Natural Language Processing (NLP) tasks. This kind of technique plays significant role in human activities and has impact on the daily behaviours. Each article in different fields such as politics and business represent different opinions according to the writer tendency. A huge amount of data will be acquired through that differentiation. The capability to manage the political orientation of an online article automatically. Therefore, there is no corpus for political categorization was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. However, we introduce political Arabic articles dataset (PAAD) of textual data collected from newspapers, social network, general forum and ideology website. The dataset is 206 articles distributed into three categories as (Reform, Conservative and Revolutionary) that we offer to the research community on Arabic computational linguistics. We anticipate that this dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic, political text classification purposes. We present the data in raw form and excel file. Excel file will be in four types such as V1 raw data, V2 preprocessing, V3 root stemming and V4 light stemming.


Author(s):  
Lars Borin ◽  
Dimitrios Kokkinakis

In this chapter, the authors describe the development and application of language technology for intelligent information access to the content of digitized cultural heritage collections in the form of Swedish classical literary works. This technology offers sophisticated and flexible support functions to literary scholars and researchers. The authors focus on one kind of text processing technology (named entity recognition) and one research field (literary onomastics), but try to argue that the techniques involved are quite general and can be further developed in a number of directions. This way, the authors aim at supporting the users of digitized literature collections with tools that enable semantic search, browsing and indexing of texts. In this sense, the authors offer new ways for exploring the large volumes of literary texts being made available through national cultural heritage digitization projects. Language technology; Computational linguistics; Natural language processing; Literary onomastics; Named entity recognition; Corpus linguistics; Corpus annotation; Digital resources; Text technology; Cultural heritage


2019 ◽  
Vol 27 (3) ◽  
pp. 457-470 ◽  
Author(s):  
Stephen Wu ◽  
Kirk Roberts ◽  
Surabhi Datta ◽  
Jingcheng Du ◽  
Zongcheng Ji ◽  
...  

Abstract Objective This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. Materials and Methods We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. Results DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a “long tail” of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. Discussion Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). Conclusion Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field.


Linguistics ◽  
2019 ◽  
Author(s):  
Jane Chandlee

Much like the term “computational linguistics”, the term “computational phonology” has come to mean different things to different people. Research grounded in a variety of methodologies and formalisms can be included in its scope. The common thread of the research that falls under this umbrella term is the use of computational methods to investigate questions of interest in phonology, primarily how to delimit the set of possible phonological patterns from the larger set of “logically possible” patterns and how those patterns are learned. Computational phonology arguably began with the foundational result that Sound Pattern of English (SPE) rules are regular relations (provided they can’t recursively apply to their own structural change), which means they can be modeled with finite-state transducers (FSTs) and that a system of ordered rules can be composed into a single FST. The significance of this result can be seen in the prominence of finite-state models both in theoretical phonology research and in more applied areas like natural language processing and human language technology. The shift in the field of phonology from rule-based grammars to constraint-based frameworks like Optimality Theory (OT) initially sparked interest in the question of how to model OT with FSTs and thereby preserve the noted restriction of phonology to the complexity level of regular. But an additional point of interest for computational work on OT stemmed from the ways in which its architecture readily lends itself to the development of learning algorithms and models, including statistical approaches that address recognized challenges such as gradient acceptability, process optionality, and the learning of underlying forms and hidden structure. Another line of research has taken on the question of to what extent phonology is not just regular, but subregular, meaning describable with proper subclasses of the regular languages and relations. The advantages of subregular modeling of phonological phenomena are argued to be stronger typological explanations, in that the computational properties that establish the subclasses as properly subregular restrict the kinds of phenomena that can be described in desirable ways. Also, these same restrictions lead directly to provably correct learning algorithms. Once again this work has made extensive use of the finite-state formalism, but it has also employed logical characterizations that more readily extend from strings to non-linear phenomena such as autosegmental representations and syllable structure.


Author(s):  
Piotr Malak

Digital humanities and information visualization rely on huge sets of digital data. Those data are mostly delivered in the text form. Although computational linguistics provides a lot of valuable tools for text processing, the initial phase (text preprocessing) is very involved and time-consuming. The problems arise due to a human factor – they are not always errors; there is also inconsistency in forms, affecting data quality. In this chapter, the author describes and discusses the main issues that arise during the preprocessing phase of textual data gathering for InfoVis. Chosen examples of InfoVis applications are presented. Except for problems with raw, original data, solutions are also referred. Canonical approaches used in text preprocessing and common issues affecting the process and ways to prevent them are also presented. The quality of data from different sources is also discussed. The content of this chapter is a result of a few years of practical experience in natural language processing gained during realization of different projects and evaluation campaigns.


Author(s):  
Kesavan Vadakalur Elumalai ◽  
Niladri Sekhar Das ◽  
Mufleh Salem M. Alqahtani ◽  
Anas Maktabi

Part-of-speech (POS) tagging is an indispensable method of text processing. The main aim is to assign part-of-speech to words after considering their actual contextual syntactic-cum-semantic roles in a piece of text where they occur (Siemund & Claridge 1997). This is a useful strategy in language processing, language technology, machine learning, machine translation, and computational linguistics as it generates a kind of output that enables a system to work with natural language texts with greater accuracy and success. Part-of-speech tagging is also known as ‘grammatical annotation’ and ‘word category disambiguation’ in some area of linguistics where analysis of form and function of words are important avenues for better comprehension and application of texts. Since the primary task of POS tagging involves a process of assigning a tag to each word, manually or automatically, in a piece of natural language text, it has to pay adequate attention to the contexts where words are used. This is a tough challenge for a system as it normally fails to know how word carries specific linguistic information in a text and what kind of larger syntactic frames it requires for its operation. The present paper takes up this issue into consideration and tries to critically explore how some of the well-known POS tagging systems are capable of handling this kind of challenge and if these POS tagging systems are at all successful in assigning appropriate POS tags to words without accessing information from extratextual domains. The novelty of the paper lies in its attempt for looking into some of the POS tagging schemes proposed so far to see if the systems are actually successful in dealing with the complexities involved in tagging words in texts. It also checks if the performance of these systems is better than manual POS tagging and verifies if information and insights gathered from such enterprises are at all useful for enhancing our understanding about identity and function of words used in texts. All these are addressed in this paper with reference to some of the POS taggers available to us. Moreover, the paper tries to see how a POS tagged text is useful in various applications thereby creating a sense of awareness about multifunctionality of tagged texts among language users.


the present world information technology is predominant which has the ability to change the future of this Globe. Being a Futuristic Technology. Computational linguistics can change communication models among human beings. Due to the changing context and development of Natural Language Processing various new doors are open in the fields of computational linguistics. Computational Linguistics (CL) increases the applicability of Language Technology towards man-machine interactions. Globalization can convert the world into a small village. For interchange human knowledge among various communities, Auto language processing play a vital role. In this paper, it is tried to discuss the various dimension of language technology and computational linguistics


Sign in / Sign up

Export Citation Format

Share Document