Interpreting Design Structure in Patents Using an Ontology Library

Author(s):  
Zhen Li ◽  
Derrick Tate

Patents contain valuable information for engineering design. However, the increasing number of annual patent publications makes it difficult for any individual designer to assimilate all up-to-date knowledge hidden in patent documents. In this paper, we proposed a computational approach to interpret design structure embedded in patent claims using pre-developed ontology libraries. The study combined natural language processing (NLP) techniques, text data-mining, ontological engineering, and our rule-based tree generation method. Data sources and adopted tools included online patent documents, knowledge gathered from engineering textbooks, WordNet, a part-of-speech tagger developed by the Stanford NLP group, and Graphviz. We showed that the framework proposed in the paper not only could help minimize manual work required for obtaining design structures but also enable automatic dissimilarity comparison between patents.

Author(s):  
G Deena ◽  
K Raja ◽  
K Kannan

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.


2019 ◽  
Author(s):  
Alexandre M. R. Cunha ◽  
Kele T. Belloze ◽  
Gustavo P. Guedes

Textual data sources may assist in the detection of adverse events not predicted for a particular drug. However, given the amount of information available in several sources, it is reasonable to adopt a computational approach to analyze these sources to search for adverse events. In this scenario, we created an extension of CoreNLP to process Brazilian Portuguese texts from pharma- covigilance area. We trained three natural language models: a Part-of-speech tagger, a parser and a Named Entity Recognizer. Preliminary results indicate success in generating a dependency tree for phrases in the pharmacovigilance area and in identifying pharmacovigilance named entities.


2018 ◽  
Vol 2 (3) ◽  
pp. 157
Author(s):  
Ahmad Subhan Yazid ◽  
Agung Fatwanto

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 520
Author(s):  
Jakobus S. du Toit ◽  
Martin J. Puttkammer

The creation of linguistic resources is crucial to the continued growth of research and development efforts in the field of natural language processing, especially for resource-scarce languages. In this paper, we describe the curation and annotation of corpora and the development of multiple linguistic technologies for four official South African languages, namely isiNdebele, Siswati, isiXhosa, and isiZulu. Development efforts included sourcing parallel data for these languages and annotating each on token, orthographic, morphological, and morphosyntactic levels. These sets were in turn used to create and evaluate three core technologies, viz. a lemmatizer, part-of-speech tagger, morphological analyzer for each of the languages. We report on the quality of these technologies which improve on previously developed rule-based technologies as part of a similar initiative in 2013. These resources are made publicly accessible through a local resource agency with the intention of fostering further development of both resources and technologies that may benefit the NLP industry in South Africa.


Author(s):  
Umrinderpal Singh ◽  
Vishal Goyal

The Part of Speech tagger system is used to assign a tag to every input word in a given sentence. The tags may include different part of speech tag for a particular language like noun, pronoun, verb, adjective, conjunction etc. and may have subcategories of all these tags. Part of Speech tagging is a basic and a preprocessing task of most of the Natural Language Processing (NLP) applications such as Information Retrieval, Machine Translation, and Grammar Checking etc. The task belongs to a larger set of problems, namely, sequence labeling problems. Part of Speech tagging for Punjabi is not widely explored territory. We have discussed Rule Based and HMM based Part of Speech tagger for Punjabi along with the comparison of their accuracies of both approaches. The System is developed using 35 different standard part of speech tag. We evaluate our system on unseen data with state-of-the-art accuracy 93.3%.


Today we all depend upon internet to do our daily activities. For booking hotel, air tickets, finding particular places, travelling, cooking, education, banking, etc. we require internet. To get a specific thing immediately, we require filtering tools. E-learning is a new and rapidly growing media in modern education system, which is totally based upon internet. While surfing on internet students may get distracted from offensive and irrelevant websites. In avoiding such distractions, filters play a vital role. This paper proposes a filter tool which carries out web scraping of text data, data cleaning, Natural language processing and filtering the non-learning sites in real-time. We have collected the text from paragraphs, images and video tags. This extracted textual data is in the form of sentences, which are processed part of speech (POS) by NLP. In NLP we are using WSD method to find the exact meaning of the ambiguous words in that context. This tool creates a knowledge base of student related sites using NLP and SVM classification technique. Word sense disambiguation is used to find the correct senses of those words, in the present sentence, which may have multiple meanings. We have created a keyword database of all learning sites. Lastly, we are classifying the sites in two categories learning and non-learning using Support Vector Machine in this tool


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 56
Author(s):  
Hongwei Li ◽  
Hongyan Mao ◽  
Jingzi Wang

Part-of-Speech (POS) tagging is one of the most important tasks in the field of natural language processing (NLP). POS tagging for a word depends not only on the word itself but also on its position, its surrounding words, and their POS tags. POS tagging can be an upstream task for other NLP tasks, further improving their performance. Therefore, it is important to improve the accuracy of POS tagging. In POS tagging, bidirectional Long Short-Term Memory (Bi-LSTM) is commonly used and achieves good performance. However, Bi-LSTM is not as powerful as Transformer in leveraging contextual information, since Bi-LSTM simply concatenates the contextual information from left-to-right and right-to-left. In this study, we propose a novel approach for POS tagging to improve the accuracy. For each token, all possible POS tags are obtained without considering context, and then rules are applied to prune out these possible POS tags, which we call rule-based data preprocessing. In this way, the number of possible POS tags of most tokens can be reduced to one, and they are considered to be correctly tagged. Finally, POS tags of the remaining tokens are masked, and a model based on Transformer is used to only predict the masked POS tags, which enables it to leverage bidirectional contexts. Our experimental result shows that our approach leads to better performance than other methods using Bi-LSTM.


2020 ◽  
Vol 17 (4) ◽  
pp. 1842-1846
Author(s):  
Praveen Edward James ◽  
Mun Hou Kit ◽  
Chockalingam Aravind Vaithilingam ◽  
Alan Tan Wee Chiat

Natural Language Processing (NLP) systems involve Natural Language Understanding (NLU), Dialogue Management (DM) and Natural Language Generation (NLG). The purpose of this work involves integrating learning with examples and rule-based processing to design an NLP system. The design involves a three-stage processing framework, which combines syntactic generation, semantic extraction and a strong rule-based control. The syntactic generator generates syntax by aligning sentences with Part-of-Speech (POS) tags limited by the number of words in the lexicon. The semantic extractor extracts meaningful keywords from the queries raised. The above two modules are controlled by generalized rules by the rule-based controller module. The system is evaluated under different domains. The results reveal that the accuracy of the system is 92.33% on an average. The design process is simple, and the processing time is 2.12 seconds, which is minimal compared to similar statistical models. The performance of an NLP tool in a certain task can be estimated by the quality of its predictions on the classification of unseen data. The results reveal similar performance with existing systems indicating the possibility of usage for similar tasks. The system supports a vocabulary of about 700 words and can be used as an NLP module in a spoken dialogue system for various domains or task areas.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1372
Author(s):  
Sanjanasri JP ◽  
Vijay Krishna Menon ◽  
Soman KP ◽  
Rajendran S ◽  
Agnieszka Wolk

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.


Sign in / Sign up

Export Citation Format

Share Document