scholarly journals N-gram probability effects in a cloze task

2014 ◽  
Vol 9 (3) ◽  
pp. 437-472 ◽  
Author(s):  
Cyrus Shaoul ◽  
R. Harald Baayen ◽  
Chris F. Westbury

What knowledge influences our choice of words when we write or speak? Predicting which word a person will produce next is not easy, even when the linguistic context is known. One task that has been used to assess context dependent word choice is the fill-in-the-blank task, also called the cloze task. The cloze probability of specific context is an empirical measure found by asking many people to fill in the blank. In this paper we harness the power of large corpora to look at the influence of corpus-derived probabilistic information from a word’s micro-context on word choice. We asked young adults to complete short phrases called n-grams with up to 20 responses per phrase. The probability of the responded word and the conditional probability of the response given the context were predictive of the frequency with which each response was produced. Furthermore the order in which the participants generated multiple completions of the same context was predicted by the conditional probability as well. These results suggest that word choice in cloze tasks taps into implicit knowledge of a person’s past experience with that word in various contexts. Furthermore, the importance of n-gram conditional probabilities in our analysis is further evidence of implicit knowledge about multi-word sequences and support theories of language processing that involve anticipating or predicting based on context.

Author(s):  
E. D. Avedyan ◽  
Le Thi Trang Linh

The article presents the analytical results of the decision-making by the majority voting algorithm (MVA). Particular attention is paid to the case of an even number of experts. The conditional probabilities of the MVA for two hypotheses are given for an even number of experts and their properties are investigated depending on the conditional probability of decision-making by independent experts of equal qualifications and on their number. An approach to calculating the probabilities of the correct solution of the MVA with unequal values of the conditional probabilities of accepting hypotheses of each statistically mutually independent expert is proposed. The findings are illustrated by numerical and graphical calculations.


2021 ◽  
pp. 103048
Author(s):  
Nidal Nasser ◽  
Lutful Karim ◽  
Ahmed El Ouadrhiri ◽  
Asmaa Ali ◽  
Nargis Khan
Keyword(s):  

2020 ◽  
Vol 30 (1) ◽  
pp. 192-208 ◽  
Author(s):  
Hamza Aldabbas ◽  
Abdullah Bajahzar ◽  
Meshrif Alruily ◽  
Ali Adil Qureshi ◽  
Rana M. Amir Latif ◽  
...  

Abstract To maintain the competitive edge and evaluating the needs of the quality app is in the mobile application market. The user’s feedback on these applications plays an essential role in the mobile application development industry. The rapid growth of web technology gave people an opportunity to interact and express their review, rate and share their feedback about applications. In this paper we have scrapped 506259 of user reviews and applications rate from Google Play Store from 14 different categories. The statistical information was measured in the results using different of common machine learning algorithms such as the Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes. Different parameters including the accuracy, precision, recall, and F1 score were used to evaluate Bigram, Trigram, and N-gram, and the statistical result of these algorithms was compared. The analysis of each algorithm, one by one, is performed, and the result has been evaluated. It is concluded that logistic regression is the best algorithm for review analysis of the Google Play Store applications. The results have been checked scientifically, and it is found that the accuracy of the logistic regression algorithm for analyzing different reviews based on three classes, i.e., positive, negative, and neutral.


2020 ◽  
Vol 34 (05) ◽  
pp. 7391-7398
Author(s):  
Muhammad Asif Ali ◽  
Yifang Sun ◽  
Bing Li ◽  
Wei Wang

Fine-Grained Named Entity Typing (FG-NET) is a key component in Natural Language Processing (NLP). It aims at classifying an entity mention into a wide range of entity types. Due to a large number of entity types, distant supervision is used to collect training data for this task, which noisily assigns type labels to entity mentions irrespective of the context. In order to alleviate the noisy labels, existing approaches on FG-NET analyze the entity mentions entirely independent of each other and assign type labels solely based on mention's sentence-specific context. This is inadequate for highly overlapping and/or noisy type labels as it hinders information passing across sentence boundaries. For this, we propose an edge-weighted attentive graph convolution network that refines the noisy mention representations by attending over corpus-level contextual clues prior to the end classification. Experimental evaluation shows that the proposed model outperforms the existing research by a relative score of upto 10.2% and 8.3% for macro-f1 and micro-f1 respectively.


Author(s):  
Saugata Bose ◽  
Ritambhra Korpal

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.


Information ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 317 ◽  
Author(s):  
Karol Nowakowski ◽  
Michal Ptaszynski ◽  
Fumito Masui

Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter—a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experimental results we obtained demonstrate the high performance of our algorithm, comparable with the other best-performing models. Given its low computational cost and competitive results, we believe that the proposed approach could be extended to other languages, and possibly also to other Natural Language Processing tasks, such as speech recognition.


2016 ◽  
Vol 10 (2) ◽  
pp. 284-300 ◽  
Author(s):  
MARK J. SCHERVISH ◽  
TEDDY SEIDENFELD ◽  
JOSEPH B. KADANE

AbstractLet κ be an uncountable cardinal. Using the theory of conditional probability associated with de Finetti (1974) and Dubins (1975), subject to several structural assumptions for creating sufficiently many measurable sets, and assuming that κ is not a weakly inaccessible cardinal, we show that each probability that is not κ-additive has conditional probabilities that fail to be conglomerable in a partition of cardinality no greater than κ. This generalizes a result of Schervish, Seidenfeld, & Kadane (1984), which established that each finite but not countably additive probability has conditional probabilities that fail to be conglomerable in some countable partition.


Author(s):  
Kenny Easwaran

Conditional probability has been put to many uses in philosophy, and several proposals have been made regarding its relation to unconditional probability, especially in cases involving infinitely many alternatives that may have probability 0. This chapter briefly summarizes some of the literature connecting conditional probabilities to probabilities of conditionals and to Humphreys' Paradox for chances, and then investigates in greater depth the issues around probability 0. Approaches due to Popper, Rényi, and Kolmogorov are considered. Some of the limitations and alternative formulations of each are discussed, in particular the issues arising around the property of “conglomerability” and the idea that conditional probabilities may depend on a conditioning algebra rather than just an event.


2019 ◽  
Vol 29 (7) ◽  
pp. 938-971 ◽  
Author(s):  
Kenta Cho ◽  
Bart Jacobs

AbstractThe notions of disintegration and Bayesian inversion are fundamental in conditional probability theory. They produce channels, as conditional probabilities, from a joint state, or from an already given channel (in opposite direction). These notions exist in the literature, in concrete situations, but are presented here in abstract graphical formulations. The resulting abstract descriptions are used for proving basic results in conditional probability theory. The existence of disintegration and Bayesian inversion is discussed for discrete probability, and also for measure-theoretic probability – via standard Borel spaces and via likelihoods. Finally, the usefulness of disintegration and Bayesian inversion is illustrated in several examples.


2020 ◽  
pp. 1-25
Author(s):  
Kamila POLIŠENSKÁ ◽  
Shula CHIAT ◽  
Jakub SZEWCZYK ◽  
Katherine E. TWOMEY

Abstract Theories of language processing differ with respect to the role of abstract syntax and semantics vs surface-level lexical co-occurrence (n-gram) frequency. The contribution of each of these factors has been demonstrated in previous studies of children and adults, but none have investigated them jointly. This study evaluated the role of all three factors in a sentence repetition task performed by children aged 4–7 and 11–12 years. It was found that semantic plausibility benefitted performance in both age groups; syntactic complexity disadvantaged the younger group but benefitted the older group; while contrary to previous findings, n-gram frequency did not facilitate, and in a post-hoc analysis even hampered, performance. This new evidence suggests that n-gram frequency effects might be restricted to the highly constrained and frequent n-grams used in previous investigations, and that semantics and morphosyntax play a more powerful role than n-gram frequency, supporting the role of abstract linguistic knowledge in children's sentence processing.


Sign in / Sign up

Export Citation Format

Share Document