Extracting Semantic Relationships in Greek Literary Texts

Despina Christou; Grigorios Tsoumakas

doi:10.3390/su13169391

Extracting Semantic Relationships in Greek Literary Texts

Sustainability ◽

10.3390/su13169391 ◽

2021 ◽

Vol 13 (16) ◽

pp. 9391

Author(s):

Despina Christou ◽

Grigorios Tsoumakas

Keyword(s):

Language Processing ◽

Language Model ◽

Superior Performance ◽

Literary Texts ◽

Greek Literature ◽

Semantic Relationships ◽

Metadata Extraction ◽

Organization And Management ◽

Promising Tool ◽

Time Period

In the era of Big Data, the digitization of texts and the advancements in Artificial Intelligence (AI) and Natural Language Processing (NLP) are enabling the automatic analysis of literary works, allowing us to delve into the structure of artifacts and to compare, explore, manage and preserve the richness of our written heritage. This paper proposes a deep-learning-based approach to discovering semantic relationships in literary texts (19th century Greek Literature) facilitating the analysis, organization and management of collections through the automation of metadata extraction. Moreover, we provide a new annotated dataset used to train our model. Our proposed model, REDSandT_Lit, recognizes six distinct relationships, extracting the richest set of relations up to now from literary texts. It efficiently captures the semantic characteristics of the investigating time-period by finetuning the state-of-the-art transformer-based Language Model (LM) for Modern Greek in our corpora. Extensive experiments and comparisons with existing models on our dataset reveal that REDSandT_Lit has superior performance (90% accuracy), manages to capture infrequent relations (100%F in long-tail relations) and can also correct mislabelled sentences. Our results suggest that our approach efficiently handles the peculiarities of literary texts, and it is a promising tool for managing and preserving cultural information in various settings.

Download Full-text

POP-ON: Prediction of Process Using One-Way Language Model Based on NLP Approach

Applied Sciences ◽

10.3390/app11020864 ◽

2021 ◽

Vol 11 (2) ◽

pp. 864

Author(s):

Junhyung Moon ◽

Gyuyoung Park ◽

Jongpil Jeong

Keyword(s):

Natural Language ◽

Language Processing ◽

Business Process ◽

Process Management ◽

Language Model ◽

Superior Performance ◽

Event Prediction ◽

Process Prediction ◽

Monitoring Service ◽

Deep Learning Model

In business process management, the monitoring service is an important element that can prevent various problems in advance from before they occur in companies and industries. Execution log is created in an information system that is aware of the enterprise process, which helps predict the process. The ultimate goal of the proposed method is to predict the process following the running process instance and predict events based on previously completed event log data. Companies can flexibly respond to unwanted deviations in their workflow. When solving the next event prediction problem, we use a fully attention-based transformer, which has performed well in recent natural language processing approaches. After recognizing the name attribute of the event in the natural language and predicting the next event, several necessary elements were applied. It is trained using the proposed deep learning model according to specific pre-processing steps. Experiments using various business process log datasets demonstrate the superior performance of the proposed method. The name of the process prediction model we propose is “POP-ON”.

Download Full-text

Research on Automatic Question Answering of Generative Knowledge Graph Based on Pointer Network

Information ◽

10.3390/info12030136 ◽

2021 ◽

Vol 12 (3) ◽

pp. 136

Author(s):

Shuang Liu ◽

Nannan Tan ◽

Yaqian Ge ◽

Niko Lukač

Keyword(s):

Knowledge Base ◽

Word Frequency ◽

Language Processing ◽

Question Answering ◽

Language Model ◽

Word List ◽

Superior Performance ◽

Knowledge Graph ◽

Semantic Features ◽

Question Answering Systems

Question-answering systems based on knowledge graphs are extremely challenging tasks in the field of natural language processing. Most of the existing Chinese Knowledge Base Question Answering(KBQA) can only return the knowledge stored in the knowledge base by extractive methods. Nevertheless, this processing does not conform to the reading habits and cannot solve the Out-of-vocabulary(OOV) problem. In this paper, a new generative question answering method based on knowledge graph is proposed, including three parts of knowledge vocabulary construction, data pre-processing, and answer generation. In the word list construction, BiLSTM-CRF is used to identify the entity in the source text, finding the triples contained in the entity, counting the word frequency, and constructing it. In the part of data pre-processing, a pre-trained language model BERT combining word frequency semantic features is adopted to obtain word vectors. In the answer generation part, one combination of a vocabulary constructed by the knowledge graph and a pointer generator network(PGN) is proposed to point to the corresponding entity for generating answer. The experimental results show that the proposed method can achieve superior performance on WebQA datasets than other methods.

Download Full-text

Advancing Smart Building Readiness: Automated Metadata Extraction Using Neural Language Processing Methods

Advances in Applied Energy ◽

10.1016/j.adapen.2021.100041 ◽

2021 ◽

pp. 100041

Author(s):

David Waterworth ◽

Subbu Sethuvenkatraman ◽

Quan Z. Sheng

Keyword(s):

Language Processing ◽

Processing Methods ◽

Smart Building ◽

Metadata Extraction

Download Full-text

Automated Source Code Generation and Auto-Completion Using Deep Learning: Comparing and Discussing Current Language Model-Related Approaches

AI ◽

10.3390/ai2010001 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-16

Author(s):

Juan Cruz-Benito ◽

Sanjay Vishwakarma ◽

Francisco Martin-Fernandez ◽

Ismael Faro

Keyword(s):

Deep Learning ◽

Learning Community ◽

Programming Languages ◽

Language Processing ◽

Code Generation ◽

Language Model ◽

Language Models ◽

Stochastic Gradient Descent ◽

Network Architectures ◽

Learning Architectures

In recent years, the use of deep learning in language models has gained much attention. Some research projects claim that they can generate text that can be interpreted as human writing, enabling new possibilities in many application areas. Among the different areas related to language processing, one of the most notable in applying this type of modeling is programming languages. For years, the machine learning community has been researching this software engineering area, pursuing goals like applying different approaches to auto-complete, generate, fix, or evaluate code programmed by humans. Considering the increasing popularity of the deep learning-enabled language models approach, we found a lack of empirical papers that compare different deep learning architectures to create and use language models based on programming code. This paper compares different neural network architectures like Average Stochastic Gradient Descent (ASGD) Weight-Dropped LSTMs (AWD-LSTMs), AWD-Quasi-Recurrent Neural Networks (QRNNs), and Transformer while using transfer learning and different forms of tokenization to see how they behave in building language models using a Python dataset for code generation and filling mask tasks. Considering the results, we discuss each approach’s different strengths and weaknesses and what gaps we found to evaluate the language models or to apply them in a real programming context.

Download Full-text

Astrid

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436907 ◽

2020 ◽

Vol 14 (4) ◽

pp. 471-484

Author(s):

Suraj Shetiya ◽

Saravanan Thirumuruganathan ◽

Nick Koudas ◽

Gautam Das

Keyword(s):

Deep Learning ◽

Objective Function ◽

Pattern Matching ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Selectivity Estimation ◽

Statistical Correlations ◽

Benchmark Datasets ◽

Traditional Approaches

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.

Download Full-text

Word Embedding for Semantically Relative Words: an Experimental Study

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2018-6-726-733 ◽

2018 ◽

Vol 25 (6) ◽

pp. 726-733

Author(s):

Maria S. Karyaeva ◽

Pavel I. Braslavski ◽

Valery A. Sokolov

Keyword(s):

Experimental Study ◽

Natural Language Processing ◽

Language Processing ◽

Intelligent Systems ◽

Russian Language ◽

Word Embedding ◽

Semantic Relations ◽

Automatic Extraction ◽

Semantic Relationships ◽

The Russian Language

The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.

Download Full-text

EMOSIS Sentiment Analysis on Tweets with Emotion and Intensity Level Recognition Considering Ending Punctuation Marks

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4518.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10289-10293

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Language Processing ◽

Significant Role ◽

Language Model ◽

Intensity Level ◽

Processing Stage ◽

Overall Performance

Sentiment Analysis is a tool used for determining the Polarity or Emotion of a Sentence. It is a field of Natural Language Processing which focuses on the study of opinions. In this study, the researchers solved one key challenge in Sentiment Analysis, which is to consider the Ending Punctuation Marks present in a sentence. Ending punctuation marks plays a significant role in Emotion Recognition and Intensity Level Recognition. The research made used of tweets expressing opinions about Philippine President Rodrigo Duterte. These downloaded tweets served as the inputs. It was initially subjected to pre-processing stage to be able to prepare the sentences for processing. A Language Model was created to serve as the classifier for determining the scores of the tweets. The scores give the polarity of the sentence. Accuracy is very important in sentiment analysis. To increase the chance of correctly identifying the polarity of the tweets, the input undergone Intensity Level Recognition which determines the intensifiers and negations within the sentences. The system was evaluated with overall performance of 80.27%.

Download Full-text

Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201805007 ◽

2018 ◽

pp. 13-17

Author(s):

Kaan Ant ◽

Ugur Sogukpinar ◽

Mehmet Fatif Amasyali

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Semantic Relations ◽

Template Method ◽

Semantic Relationships ◽

Semantic Spaces

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.

Download Full-text

Recognizing Emotional Content of Emails as a byproduct of Natural Language Processing-based Metadata Extraction

Journal of the Korean Society for information Management ◽

10.3743/kosim.2006.23.2.167 ◽

2006 ◽

Vol 23 (2) ◽

pp. 167-183

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Emotional Content ◽

Metadata Extraction

Download Full-text

Better Performance with Transformer: CPPFormer in precise prediction of cell-Penetrating Peptides

Current Medicinal Chemistry ◽

10.2174/0929867328666210920103140 ◽

2021 ◽

Vol 28 ◽

Author(s):

Yuyang Xue ◽

Xiucai Ye ◽

Lesong Wei ◽

Xin Zhang ◽

Tetsuya Sakurai ◽

...

Keyword(s):

Machine Learning ◽

Language Processing ◽

Cell Penetrating Peptides ◽

Machine Learning Techniques ◽

Superior Performance ◽

Practical Applications ◽

Deep Model ◽

Cell Penetrating ◽

Precise Prediction ◽

Transformer Model

: With its superior performance, the Transformer model, which is based on the 'Encoder-Decoder' paradigm, has become the mainstream in natural language processing. On the other hand, bioinformatics has embraced machine learning and made great progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are one kind of permeable protein that is convenient as a kind of 'postman' in drug penetration tasks. However, a small number of CPPs have been discovered by research, let alone practical applications in drug permeability. Therefore, correctly identifying the CPPs has opened up a new way to take macromolecules into cells without other potentially harmful materials in the drug. Most of the previous work only uses trivial machine learning techniques and hand-crafted features to construct a simple classifier. In CPPFormer, we learn from the idea of implementing the attention structure of Transformer, rebuilding the network based on the characteristics of CPPs according to its short length, and using an automatic feature extractor with a few manual engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical result has shown that our proposed deep model-based method has achieved the best performance of 92.16% accuracy in the CPP924 dataset and has passed various index tests.

Download Full-text