Detecting Paraphrases in Marathi Language

Mapping Intimacies ◽

10.54646/bijscit.003 ◽

2020 ◽

pp. 7-17

Author(s):

Shruti Srivastava ◽

◽

Sharvari Govilkar ◽

Keyword(s):

Semantic Similarity ◽

Question Answering ◽

Real Life ◽

Indian Languages ◽

Plagiarism Detection ◽

Statistical Similarity ◽

Universal Networking Language ◽

Textual Content ◽

Factual Data ◽

Semantic Significance

Paraphrasing refers to the sentences that either differs in their textual content or dissimilar in rearrangement of words but convey the same meaning. Identifying a paraphrase is exceptionally important in various real life applications such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no existing system to identify paraphrases in Marathi. This is the first such endeavor in Marathi Language. A paraphrase has different structured sentences and Marathi being semantically strong language hence this system is designed for checking both statistical and semantic similarity of Marathi sentences. Statistical similarity measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and word-distance. Universal Networking Language (UNL) speaks about the semantic significance in the sentence without any syntacticpointofinterest.Hence, these mantic similarity calculated on the basis of generated UNL graphs for two Marathi sentences renders semantic equality of two Marathi sentences. The total para phrases core was calculated after joining statistical and semantic similarity scores which gives the judgement of being paraphrase or non-paraphrase about the Marathi sentences.

Download Full-text

Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT

Methods of Information in Medicine ◽

10.1055/s-0041-1731390 ◽

2021 ◽

Author(s):

Faith Wavinya Mutinda ◽

Shuntaro Yada ◽

Shoko Wakamiya ◽

Eiji Aramaki

Keyword(s):

Semantic Similarity ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Question Answering ◽

Case Reports ◽

Plagiarism Detection ◽

Wide Range ◽

Clinical Domain ◽

Semantic Textual Similarity

Abstract Background Semantic textual similarity (STS) captures the degree of semantic similarity between texts. It plays an important role in many natural language processing applications such as text summarization, question answering, machine translation, information retrieval, dialog systems, plagiarism detection, and query ranking. STS has been widely studied in the general English domain. However, there exists few resources for STS tasks in the clinical domain and in languages other than English, such as Japanese. Objective The objective of this study is to capture semantic similarity between Japanese clinical texts (Japanese clinical STS) by creating a Japanese dataset that is publicly available. Materials We created two datasets for Japanese clinical STS: (1) Japanese case reports (CR dataset) and (2) Japanese electronic medical records (EMR dataset). The CR dataset was created from publicly available case reports extracted from the CiNii database. The EMR dataset was created from Japanese electronic medical records. Methods We used an approach based on bidirectional encoder representations from transformers (BERT) to capture the semantic similarity between the clinical domain texts. BERT is a popular approach for transfer learning and has been proven to be effective in achieving high accuracy for small datasets. We implemented two Japanese pretrained BERT models: a general Japanese BERT and a clinical Japanese BERT. The general Japanese BERT is pretrained on Japanese Wikipedia texts while the clinical Japanese BERT is pretrained on Japanese clinical texts. Results The BERT models performed well in capturing semantic similarity in our datasets. The general Japanese BERT outperformed the clinical Japanese BERT and achieved a high correlation with human score (0.904 in the CR dataset and 0.875 in the EMR dataset). It was unexpected that the general Japanese BERT outperformed the clinical Japanese BERT on clinical domain dataset. This could be due to the fact that the general Japanese BERT is pretrained on a wide range of texts compared with the clinical Japanese BERT.

Download Full-text

Disney in December

Teaching Children Mathematics ◽

10.5951/teacchilmath.19.5.0290 ◽

2012 ◽

Vol 19 (5) ◽

pp. 290-291 ◽

Cited By ~ 1

Author(s):

Sue E. Hoge ◽

Karin E. Perry

Keyword(s):

Problem Solving ◽

Common Core ◽

Common Core State Standards ◽

Real Life ◽

State Standards ◽

Core State ◽

The Common ◽

Factual Data

Math by the Month is a regular department of the journal. It features collections of short activities focused on a monthly theme. These articles aim for an inquiry or problem-solving orientation that includes at least four activities each for K–Grade 2, Grades 3–4, and Grades 5–6. This month's problem set aligns with the Common Core State Standards for Mathematics, includes factual data from Disney Parks, and makes connections between mathematics and real-life applications.

Download Full-text

A Multilingual Semantic Similarity-Based Approach for Question-Answering Systems

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-030-29551-6_54 ◽

2019 ◽

pp. 604-614

Author(s):

Wafa Wali ◽

Fatma Ghorbel ◽

Bilel Gragouri ◽

Fayçal Hamdi ◽

Elisabeth Metais

Keyword(s):

Semantic Similarity ◽

Question Answering ◽

Question Answering Systems

Download Full-text

LIS4: Lesk Inspired Sense Specific Semantic Similarity using WordNet

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500064 ◽

2021 ◽

pp. 2150006

Author(s):

Saravanakumar Kandasamy ◽

Aswani Kumar Cherukuri

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Gold Standard ◽

Question Answering ◽

Knowledge Based ◽

Benchmark Datasets ◽

Processing Information

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.

Download Full-text

Sentence Similarity Metric and its Application in FAQ System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.718-720.2248 ◽

2013 ◽

Vol 718-720 ◽

pp. 2248-2251

Author(s):

Pei Ying Zhang

Keyword(s):

Semantic Similarity ◽

Question Answering ◽

Similarity Metric ◽

Question Answering System ◽

Sentence Similarity

FAQ system is a question answering system which finds the question sentence from question-answer collection and then returns its corresponding answer to user. The task of matching questions to corresponding question-answer pairs has become a major challenge in FAQ system. This paper proposes a method for sentence similarity metric between questions according to its semantic similarity as well as the length of question length. Experiments show that this method can improve the accuracy and intelligence of answering system, has some practical value.

Download Full-text

Semantic Similarity/Relatedness for Cross Language Plagiarism Detection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v1.i2.pp371-374 ◽

2016 ◽

Vol 1 (2) ◽

pp. 371 ◽

Cited By ~ 2

Author(s):

Hanane Ezzikouri ◽

Mohammed Erritali ◽

Mohamed Oukessou

Keyword(s):

Semantic Similarity ◽

Semantic Relatedness ◽

Application Programming Interface ◽

Plagiarism Detection ◽

French And English ◽

Unique Interpretation ◽

Application Programming ◽

Cross Language ◽

Similarity Distance ◽

Programming Interface

<p>Generally utterances in natural language are highly ambiguous, and a unique interpretation can usually be determined only by taking into account the context in the utterance occurred. Automatically determining the correct sense of a polysemous word is a complicated problem especially in multilingual corpuses. This paper presents an application programming interface for several Semantic Relatedness/Similarity metrics measuring semantic similarity/distance between multilingual words and concepts, in order to use it after for sentences and paragraphs in Cross Language Plagiarism Detection (CLPD); using WordNet for the English-French and English-Arabic multilingual plagiarism cases.</p>

Download Full-text

Question Classification using Naive Bayes Classifier and Creating Missing Classes using Semantic Similarity in Question Answering System

International Journal of Engineering Trends and Technology ◽

10.14445/22315381/ijett-v23p231 ◽

2015 ◽

Vol 23 (4) ◽

pp. 155-160 ◽

Cited By ~ 2

Author(s):

Jeena Mathew ◽

◽

Shine N Das

Keyword(s):

Semantic Similarity ◽

Question Answering ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Question Answering System ◽

Question Classification

Download Full-text

Enhancing the Performance of Semantic Search in Bengali using Neural Net and other Classification Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3566.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 4170-4176

Keyword(s):

Question Answering ◽

Semantic Search ◽

The Internet ◽

Neural Net ◽

Indian Languages ◽

Classification Techniques ◽

User Query ◽

Impressive Result ◽

Semantic Similarity Analysis ◽

Bengali Language

To know the information from the internet searching is one of the most important part for any user. In case of ‘Syntactic Search’ keyword based matching technique is used. Search accuracy is improved applying the filter like location, preference, user-history etc. However, it can happen that the user query or question and the best available answer or result in the internet domain has no terms in common or ignorable number of terms is common. In such case syntactic search cannot give the desired output. The role of ‘Semantic Search’ becomes prevalent in this scenario. The execution of semantic search faces challenge due to unavailability of resources like WordNet, Ontology, Annotation etc. An end to end algorithm is described to improve the accuracy of the semantic search in this work. Four classification techniques are used. They are ANN, Decision Tree, SVM and Naïve Bayes. Dataset is provided from the TDIL project of the Ministry of Electronics and IT, Govt. of India. The repository contains 86 categories of text having more than a million sentences. After getting the impressive result for the Bengali language test run was done for other Indian languages and a very good result is achieved. This research is extremely useful for the automatic question answering system, semantic similarity analysis, e-governance and m- governance.

Download Full-text