Using Enhanced Lexicon-Based Approaches for the Determination of Aspect Categories and Their Polarities in Arabic Reviews

Author(s):  
Mohammad Al Smadi ◽  
Islam Obaidat ◽  
Mahmoud Al-Ayyoub ◽  
Rami Mohawesh ◽  
Yaser Jararweh

Sentiment Analysis (SA) is the process of determining the sentiment of a text written in a natural language to be positive, negative or neutral. It is one of the most interesting subfields of natural language processing (NLP) and Web mining due to its diverse applications and the challenges associated with applying it on the massive amounts of textual data available online (especially, on social networks). Most of the current work on SA focus on the English language and work on the sentence-level or the document-level. This work focuses on the less studied version of SA, which is aspect-based SA (ABSA) for the Arabic language. Specifically, this work considers two ABSA tasks: aspect category determination and aspect category polarity determination, and makes use of the publicly available human annotated Arabic dataset (HAAD) along with its baseline experiments conducted by HAAD providers. In this work, several lexicon-based approaches are presented for the two tasks at hand and show that some of the presented approaches significantly outperforms the best-known result on the given dataset. An enhancement of 9% and 46% were achieved in the tasks aspect category determination and aspect category polarity determination respectively.

Author(s):  
Santosh Kumar Mishra ◽  
Rijul Dhir ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .


Students’ life is incomplete without exams because exams are those that help students in evaluating themselves and thus proceeding further in studies. So, the starting step in conducting such examinations is creating a question paper. Generating a question paper is still in its traditional way, where lecturers or professors that are the teaching staff are doing it manually and wasting a terrible amount of time in selecting what type of questions are to be generated. It’s so difficult to create a question paper as it includes a lot of resource utilization and exhaustion. These tasks can be automated. As we are seeing a lot of development in new, exciting technologies and these technologies can help and can make the process of automation easier. So for automation, we use Machine Learning and Natural Language Processing as this whole task involves using and manipulating textual data. In this solution, we provide our model with a textual paragraph from which the questions are to be selectively generated and we develop the multiple choices using a certain distinctive process for the users.


The software development procedure begins with identifying the requirement analysis. The process levels of the requirements start from analysing the requirements to sketch the design of the program, which is very critical work for programmers and software engineers. Moreover, many errors will happen during the requirement analysis cycle transferring to other stages, which leads to the high cost of the process more than the initial specified process. The reason behind this is because of the specifications of software requirements created in the natural language. To minimize these errors, we can transfer the software requirements to the computerized form by the UML diagram. To overcome this, a device has been designed, which plans can provide semi-automatized aid for designers to provide UML class version from software program specifications using natural Language Processing techniques. The proposed technique outlines the class diagram in a well-known configuration and additionally facts out the relationship between instructions. In this research, we propose to enhance the procedure of producing the UML diagrams by utilizing the Natural Language, which will help the software development to analyze the software requirements with fewer errors and efficient way. The proposed approach will use the parser analyze and Part of Speech (POS) tagger to analyze the user requirements entered by the user in the English language. Then, extract the verbs and phrases, etc. in the user text. The obtained results showed that the proposed method got better results in comparison with other methods published in the literature. The proposed method gave a better analysis of the given requirements and better diagrams presentation, which can help the software engineers. Key words: Part of Speech,UM


2017 ◽  
Vol 26 (01) ◽  
pp. 228-234 ◽  
Author(s):  
A. Névéol ◽  
P. Zweigenbaum

Summary Objectives: To summarize recent research and present a selection of the best papers published in 2016 in the field of clinical Natural Language Processing (NLP). Method: A survey of the literature was performed by the two section editors of the IMIA Yearbook NLP section. Bibliographic databases were searched for papers with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Papers were automatically ranked and then manually reviewed based on titles and abstracts. A shortlist of candidate best papers was first selected by the section editors before being peer-reviewed by independent external reviewers. Results: The five clinical NLP best papers provide a contribution that ranges from emerging original foundational methods to transitioning solid established research results to a practical clinical setting. They offer a framework for abbreviation disambiguation and coreference resolution, a classification method to identify clinically useful sentences, an analysis of counseling conversations to improve support to patients with mental disorder and grounding of gradable adjectives. Conclusions: Clinical NLP continued to thrive in 2016, with an increasing number of contributions towards applications compared to fundamental methods. Fundamental work addresses increasingly complex problems such as lexical semantics, coreference resolution, and discourse analysis. Research results translate into freely available tools, mainly for English.


Author(s):  
Kiran Raj R

Today, everyone has a personal device to access the web. Every user tries to access the knowledge that they require through internet. Most of the knowledge is within the sort of a database. A user with limited knowledge of database will have difficulty in accessing the data in the database. Hence, there’s a requirement for a system that permits the users to access the knowledge within the database. The proposed method is to develop a system where the input be a natural language and receive an SQL query which is used to access the database and retrieve the information with ease. Tokenization, parts-of-speech tagging, lemmatization, parsing and mapping are the steps involved in the process. The project proposed would give a view of using of Natural Language Processing (NLP) and mapping the query in accordance with regular expression in English language to SQL.


Reading Comprehension (RC) plays an important role in Natural Language Processing (NLP) as it reads and understands text written in Natural Language. Reading Comprehension systems comprehend the given document and answer questions in the context of the given document. This paper proposes a Reading Comprehension System for Kannada documents. The RC system analyses text in the Kannada script and allows users to pose questions to It in Kannada. This system is aimed at masses whose primary language is Kannada - who would otherwise have difficulties in parsing through vast Kannada documents for the information they require. This paper discusses the proposed model built using Term Frequency - Inverse Document Frequency (TF-IDF) and its performance in extracting the answers from the context document. The proposed model captures the grammatical structure of Kannada to provide the most accurate answers to the user


2019 ◽  
Vol 31 (2) ◽  
pp. 571-574
Author(s):  
Ardian Fera

A preposition is a word or set of words that indicates location or some other relationship between a noun or pronoun and other parts of the sentence. It refers to the word or phrase which shows the relationship between one thing and another, linking nouns, pronouns and phrases to other words in a sentence. They are abstract words that have no concrete meaning. They merely show the relationships between groups of words. Within a preposition, there are many different variations in meaning that are conveyed. The proper interpretation of prepositions is an important issue for automatic natural language understanding. Although the complexity of preposition usage has been argued for and documented by various scholars in linguistics, psycholinguistics, and computational linguistics, very few studies have been done on the function of prepositions in natural language processing (NLP) applications. The reason is that prepositions are probably the most polysemous category and thus, their linguistic realizations are difficult to predict and their cross-linguistic regularities difficult to identify. Prepositions play a major role in the syntactic structures of the English language and they often make an essential contribution to sentence meaning by signifying temporal and spatial relationships, as well as abstract relations involving cause and purpose, agent and instrument, manner and accompaniment, support and much more. They are sensitive linguistic elements that are culturally acceptable and very well known to all members of the same linguistic community. According to cognitive semantics, the figurative senses of a preposition are extended from its spatial senses through conceptual metaphors. In a pedagogical context, it may be useful to draw learners' attention to those aspects of a preposition's spatial sense that are especially relevant for its metaphorization processes. Prepositions have type restrictions on their arguments, they assign thematic roles, and they have a semantic content, possibly underspecified. The only difference with the other open-class categories like nouns, verbs or adjectives is that they do not have any morphology.


Author(s):  
Sameerah Talafha ◽  
Banafsheh Rekabdar

Arabic poetry generation is a very challenging task since the linguistic structure of the Arabic language is considered a severe challenge for many researchers and developers in the Natural Language Processing (NLP) field. In this paper, we propose a poetry generation model with extended phonetic and semantic embeddings (Phonetic CNNsubword embeddings). We show that Phonetic CNNsubword embeddings have an effective contribution to the overall model performance compared to FastTextsubword embeddings. Our poetry generation model consists of a two-stage approach: (1.) generating the first verse which explicitly incorporates the theme related phrase, (2.) other verses generation with the proposed Hierarchy-Attention Sequence-to-Sequence model (HAS2S), which adequately capture word, phrase, and verse information between contexts. A comprehensive human evaluation confirms that the poems generated by our model outperform the base models in criteria such as Meaning, Coherence, Fluency, and Poeticness. Extensive quantitative experiments using Bi-Lingual Evaluation Understudy (BLEU) scores also demonstrate significant improvements over strong baselines.


Author(s):  
GÖKHAN YURTALAN ◽  
MURAT KOYUNCU ◽  
ÇİĞDEM TURHAN

Sentiment analysis attempts to resolve the senses or emotions that a writer or speaker intends to send across to the people about an object or event. It generally uses natural language processing and/or artificial intelligence techniques for processing electronic documents and mining the opinion specified in the content. In recent years, researchers have conducted many successful sentiment analysis studies for the English language which consider many words and word groups that set emotion polarities arising from the English grammar structure, and then use datasets to test their performance. However, there are only a limited number of studies for the Turkish language, and these studies have lower performance results compared to those studies for English. The reasons for this can be incorrect translation of datasets from English into Turkish and ignoring the special grammar structures in the latter. In this study, special Turkish words and linguistic constructs which affect the polarity of a sentence are determined with the aid of a Turkish linguist, and an appropriate lexicon-based polarity determination and calculation approach is introduced for this language. The proposed methodology is tested using different datasets collected from Twitter, and the test results show that the proposed system achieves better accuracy than the previously developed lexical-based sentiment analysis systems for Turkish. The authors conclude that especially analysis of word groups increases the overall performance of the system significantly.


Sign in / Sign up

Export Citation Format

Share Document