Interpreting Design Structure in Patents Using an Ontology Library

Volume 5: 25th International Conference on Design Theory and Methodology; ASME 2013 Power Transmission and Gearing Conference ◽

10.1115/detc2013-13191 ◽

2013 ◽

Cited By ~ 2

Author(s):

Zhen Li ◽

Derrick Tate

Keyword(s):

Language Processing ◽

Computational Approach ◽

Data Sources ◽

Text Data ◽

Ontological Engineering ◽

Rule Based ◽

Part Of Speech ◽

Design Structure ◽

Tree Generation ◽

Patent Documents

Patents contain valuable information for engineering design. However, the increasing number of annual patent publications makes it difficult for any individual designer to assimilate all up-to-date knowledge hidden in patent documents. In this paper, we proposed a computational approach to interpret design structure embedded in patent claims using pre-developed ontology libraries. The study combined natural language processing (NLP) techniques, text data-mining, ontological engineering, and our rule-based tree generation method. Data sources and adopted tools included online patent documents, knowledge gathered from engineering textbooks, WordNet, a part-of-speech tagger developed by the Stanford NLP group, and Graphviz. We showed that the framework proposed in the paper not only could help minimize manual work required for obtaining design structures but also enable automatic dissimilarity comparison between patents.

Download Full-text

An Automatic Question Generation System using Rule-Based Approach in Bloom’s Taxonomy

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666191113143335 ◽

2019 ◽

Vol 13 ◽

Author(s):

G Deena ◽

K Raja ◽

K Kannan

Keyword(s):

Language Processing ◽

Learning Process ◽

Question Generation ◽

Test Question ◽

Rule Based ◽

Part Of Speech ◽

Core Idea ◽

Rule Based Approach ◽

Teaching Learning ◽

Automatic Question Generation

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.

Download Full-text

Recognizing pharmacovigilance named entities in Brazilian Portuguese with CoreNLP

10.5753/bresci.2019.6314 ◽

2019 ◽

Author(s):

Alexandre M. R. Cunha ◽

Kele T. Belloze ◽

Gustavo P. Guedes

Keyword(s):

Adverse Events ◽

Brazilian Portuguese ◽

Computational Approach ◽

Language Models ◽

Data Sources ◽

Named Entities ◽

Named Entity ◽

Part Of Speech ◽

Dependency Tree ◽

Textual Data

Textual data sources may assist in the detection of adverse events not predicted for a particular drug. However, given the amount of information available in several sources, it is reasonable to adopt a computational approach to analyze these sources to search for adverse events. In this scenario, we created an extension of CoreNLP to process Brazilian Portuguese texts from pharma- covigilance area. We trained three natural language models: a Part-of-speech tagger, a parser and a Named Entity Recognizer. Preliminary results indicate success in generating a dependency tree for phrases in the pharmacovigilance area and in identifying pharmacovigilance named entities.

Download Full-text

PENENTUAN KELAS KATA PADA PART OF SPEECH TAGGING KATA AMBIGU BAHASA INDONESIA

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2018.23-05 ◽

2018 ◽

Vol 2 (3) ◽

pp. 157

Author(s):

Ahmad Subhan Yazid ◽

Agung Fatwanto

Keyword(s):

Language Processing ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Ambiguous Words ◽

Computer Science Faculty ◽

Speech Tagging ◽

Bahasa Indonesia

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.

Download Full-text

Developing Core Technologies for Resource-Scarce Nguni Languages

Information ◽

10.3390/info12120520 ◽

2021 ◽

Vol 12 (12) ◽

pp. 520

Author(s):

Jakobus S. du Toit ◽

Martin J. Puttkammer

Keyword(s):

South African ◽

Language Processing ◽

Rule Based ◽

Local Resource ◽

African Languages ◽

Linguistic Resources ◽

Part Of Speech ◽

Parallel Data ◽

Further Development

The creation of linguistic resources is crucial to the continued growth of research and development efforts in the field of natural language processing, especially for resource-scarce languages. In this paper, we describe the curation and annotation of corpora and the development of multiple linguistic technologies for four official South African languages, namely isiNdebele, Siswati, isiXhosa, and isiZulu. Development efforts included sourcing parallel data for these languages and annotating each on token, orthographic, morphological, and morphosyntactic levels. These sets were in turn used to create and evaluate three core technologies, viz. a lemmatizer, part-of-speech tagger, morphological analyzer for each of the languages. We report on the quality of these technologies which improve on previously developed rule-based technologies as part of a similar initiative in 2013. These resources are made publicly accessible through a local resource agency with the intention of fostering further development of both resources and technologies that may benefit the NLP industry in South Africa.

Download Full-text

Punjabi Pos Tagger: Rule Based and HMM

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0106 ◽

2017 ◽

Vol 7 (7) ◽

pp. 193

Author(s):

Umrinderpal Singh ◽

Vishal Goyal

Keyword(s):

Information Retrieval ◽

Language Processing ◽

State Of The Art ◽

Input Word ◽

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unseen Data ◽

Pos Tagger ◽

Speech Tagging

The Part of Speech tagger system is used to assign a tag to every input word in a given sentence. The tags may include different part of speech tag for a particular language like noun, pronoun, verb, adjective, conjunction etc. and may have subcategories of all these tags. Part of Speech tagging is a basic and a preprocessing task of most of the Natural Language Processing (NLP) applications such as Information Retrieval, Machine Translation, and Grammar Checking etc. The task belongs to a larger set of problems, namely, sequence labeling problems. Part of Speech tagging for Punjabi is not widely explored territory. We have discussed Rule Based and HMM based Part of Speech tagger for Punjabi along with the comparison of their accuracies of both approaches. The System is developed using 35 different standard part of speech tag. We evaluate our system on unseen data with state-of-the-art accuracy 93.3%.

Download Full-text

A Keyword Based Educational and Non-Educational Website Recognition Tool

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j1077.08810s19 ◽

2019 ◽

Vol 8 (10S) ◽

pp. 415-419

Keyword(s):

Language Processing ◽

Word Sense Disambiguation ◽

Vital Role ◽

Support Vector ◽

Word Sense ◽

Text Data ◽

Part Of Speech ◽

Ambiguous Words ◽

Textual Data ◽

E Learning

Today we all depend upon internet to do our daily activities. For booking hotel, air tickets, finding particular places, travelling, cooking, education, banking, etc. we require internet. To get a specific thing immediately, we require filtering tools. E-learning is a new and rapidly growing media in modern education system, which is totally based upon internet. While surfing on internet students may get distracted from offensive and irrelevant websites. In avoiding such distractions, filters play a vital role. This paper proposes a filter tool which carries out web scraping of text data, data cleaning, Natural language processing and filtering the non-learning sites in real-time. We have collected the text from paragraphs, images and video tags. This extracted textual data is in the form of sentences, which are processed part of speech (POS) by NLP. In NLP we are using WSD method to find the exact meaning of the ambiguous words in that context. This tool creates a knowledge base of student related sites using NLP and SVM classification technique. Word sense disambiguation is used to find the correct senses of those words, in the present sentence, which may have multiple meanings. We have created a keyword database of all learning sites. Lastly, we are classifying the sites in two categories learning and non-learning using Support Vector Machine in this tool

Download Full-text

Part-of-Speech Tagging with Rule-Based Data Preprocessing and Transformer

Electronics ◽

10.3390/electronics11010056 ◽

2021 ◽

Vol 11 (1) ◽

pp. 56

Author(s):

Hongwei Li ◽

Hongyan Mao ◽

Jingzi Wang

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Contextual Information ◽

Data Preprocessing ◽

Experimental Result ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Novel Approach

Part-of-Speech (POS) tagging is one of the most important tasks in the field of natural language processing (NLP). POS tagging for a word depends not only on the word itself but also on its position, its surrounding words, and their POS tags. POS tagging can be an upstream task for other NLP tasks, further improving their performance. Therefore, it is important to improve the accuracy of POS tagging. In POS tagging, bidirectional Long Short-Term Memory (Bi-LSTM) is commonly used and achieves good performance. However, Bi-LSTM is not as powerful as Transformer in leveraging contextual information, since Bi-LSTM simply concatenates the contextual information from left-to-right and right-to-left. In this study, we propose a novel approach for POS tagging to improve the accuracy. For each token, all possible POS tags are obtained without considering context, and then rules are applied to prune out these possible POS tags, which we call rule-based data preprocessing. In this way, the number of possible POS tags of most tokens can be reduced to one, and they are considered to be correctly tagged. Finally, POS tags of the remaining tokens are masked, and a model based on Transformer is used to only predict the masked POS tags, which enables it to leverage bidirectional contexts. Our experimental result shows that our approach leads to better performance than other methods using Bi-LSTM.

Download Full-text

An Integrated Process Based Natural Language Processing System

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8451 ◽

2020 ◽

Vol 17 (4) ◽

pp. 1842-1846

Author(s):

Praveen Edward James ◽

Mun Hou Kit ◽

Chockalingam Aravind Vaithilingam ◽

Alan Tan Wee Chiat

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing System ◽

Dialogue System ◽

Spoken Dialogue System ◽

Rule Based ◽

Natural Language Processing System ◽

Part Of Speech ◽

Unseen Data

Natural Language Processing (NLP) systems involve Natural Language Understanding (NLU), Dialogue Management (DM) and Natural Language Generation (NLG). The purpose of this work involves integrating learning with examples and rule-based processing to design an NLP system. The design involves a three-stage processing framework, which combines syntactic generation, semantic extraction and a strong rule-based control. The syntactic generator generates syntax by aligning sentences with Part-of-Speech (POS) tags limited by the number of words in the lexicon. The semantic extractor extracts meaningful keywords from the queries raised. The above two modules are controlled by generalized rules by the rule-based controller module. The system is evaluated under different domains. The results reveal that the accuracy of the system is 92.33% on an average. The design process is simple, and the processing time is 2.12 seconds, which is minimal compared to similar statistical models. The performance of an NLP tool in a certain task can be estimated by the quality of its predictions on the classification of unseen data. The results reveal similar performance with existing systems indicating the possibility of usage for similar tasks. The system supports a vocabulary of about 700 words and can be used as an NLP module in a spoken dialogue system for various domains or task areas.

Download Full-text

Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way

Electronics ◽

10.3390/electronics10121372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1372

Author(s):

Sanjanasri JP ◽

Vijay Krishna Menon ◽

Soman KP ◽

Rajendran S ◽

Agnieszka Wolk

Keyword(s):

Deep Learning ◽

Language Processing ◽

Semantic Space ◽

Semantic Interpretation ◽

Learning Approaches ◽

Qualitative Comparison ◽

Bilingual Dictionary ◽

Pos Tagging ◽

Part Of Speech ◽

Cross Lingual

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.

Download Full-text

Natural language processing versus rule-based text analysis: Comparing BERT score and readability indices to predict crowdfunding outcomes

Journal of Business Venturing Insights ◽

10.1016/j.jbvi.2021.e00276 ◽

2021 ◽

Vol 16 ◽

pp. e00276

Author(s):

C.S. Richard Chan ◽

Charuta Pethe ◽

Steven Skiena

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Analysis ◽

Rule Based

Download Full-text