Unilateral Weighted Jaccard Coefficient for NLP

Author(s):  
Julio Santisteban ◽  
Javier Tejada-Carcamo
Keyword(s):  
Author(s):  
Hussain A. Jaber ◽  
Ilyas Çankaya ◽  
Hadeel K. Aljobouri ◽  
Orhan M. Koçak ◽  
Oktay Algin

Background: Cluster analysis is a robust tool for exploring the underlining structures in data and grouping them with similar objects. In the researches of Functional Magnetic Resonance Imaging (fMRI), clustering approaches attempt to classify voxels depending on their time-course signals into a similar hemodynamic response over time. Objective: In this work, a novel unsupervised learning approach is proposed that relies on using Enhanced Neural Gas (ENG) algorithm in fMRI data for comparison with Neural Gas (NG) method, which has yet to be utilized for that aim. The ENG algorithm depends on the network structure of the NG and concentrates on an efficacious prototype-based clustering approach. Methods: The comparison outcomes on real auditory fMRI data show that ENG outperforms the NG and statistical parametric mapping (SPM) methods due to its insensitivity to the ordering of input data sequence, various initializations for selecting a set of neurons, and the existence of extreme values (outliers). The findings also prove its capability to discover the exact and real values of a cluster number effectively. Results: Four validation indices are applied to evaluate the performance of the proposed ENG method with fMRI and compare it with a clustering approach (NG algorithm) and model-based data analysis (SPM). These validation indices include the Jaccard Coefficient (JC), Receiver Operating Characteristic (ROC), Minimum Description Length (MDL) value, and Minimum Square Error (MSE). Conclusion: The ENG technique can tackle all shortcomings of NG application with fMRI data, identify the active area of the human brain effectively, and determine the locations of the cluster center based on the MDL value during the process of network learning.


The Holocene ◽  
2021 ◽  
pp. 095968362110259
Author(s):  
Attila J Trájer

The late Bronze Age eruption of the Thera volcano was among the largest eruptions of the Holocene era. This catastrophic event might perish all organisms from the ancient Santorini and could seriously impact the sand fly fauna of the Aegean islands. To investigate these effects, the survival possibility of the sand fly fauna in the Santorini islands and the biogeographic investigation of the sand fly fauna of eleven Aegean islands were conducted. It was found that only the south and east slopes of the massifs of Thira could provide refuge for sand fly populations. The expression-based heat map of the Jaccard coefficient matrix data showed that the Santorini islands and their neighbouring Anafi, Folegandros had clearly different z-score patterns compared to the other islands. It could be a late sign of the devastating effect of the Minoan eruption and/or the consequence of the distance of these islands from the mainland. Neither the glacial seashore patterns nor the geographic-climatic conditions can explain the present sand fly fauna of the Aegean Archipelago. If the sand fly populations of ancient Santorini survived the Minoan cataclysm, it could indicate that the environmental tolerance and the resilience of the sand fly populations can be high, and local geological and geomorphological conditions can play a greater role in the survival of sand fly species than previously assumed.


1987 ◽  
Vol 65 (3) ◽  
pp. 691-707 ◽  
Author(s):  
A. F. L. Nemec ◽  
R. O. Brinkhurst

A data matrix of 23 generic or subgeneric taxa versus 24 characters and a shorter matrix of 15 characters were analyzed by means of ordination, cluster analyses, parsimony, and compatibility methods (the last two of which are phylogenetic tree reconstruction methods) and the results were compared inter alia and with traditional methods. Various measures of fit for evaluating the parsimony methods were employed. There were few compatible characters in the data set, and much homoplasy, but most analyses separated a group based on Stylaria from the rest of the family, which could then be separated into four groups, recognized here for the first time as tribes (Naidini, Derini, Pristinini, and Chaetogastrini). There was less consistency of results within these groups. Modern methods produced results that do not conflict with traditional groupings. The Jaccard coefficient minimizes the significance of symplesiomorphy and complete linkage avoids chaining effects and corresponds to actual similarities, unlike single or average linkage methods, respectively. Ordination complements cluster analysis. The Wagner parsimony method was superior to the less flexible Camin–Sokal approach and produced better measure of fit statistics. All of the aforementioned methods contain areas susceptible to subjective decisions but, nevertheless, they lead to a complete disclosure of both the methods used and the assumptions made, and facilitate objective hypothesis testing rather than the presentation of conflicting phylogenies based on the different, undisclosed premises of manual approaches.


Author(s):  
Aya Taleb ◽  
Rizik M. H. Al-Sayyed ◽  
Hamed S. Al-Bdour

In this research, a new technique to improve the accuracy of the link prediction for most of the networks is proposed; it is based on the prediction ensemble approach using the voting merging technique. The new proposed ensemble called Jaccard, Katz, and Random models Wrapper (JKRW), it scales up the prediction accuracy and provides better predictions for different sizes of populations including small, medium, and large data. The proposed model has been tested and evaluated based on the area under curve (AUC) and accuracy (ACC) measures. These measures applied to the other models used in this study that has been built based on the Jaccard Coefficient, Katz, Adamic/Adar, and Preferential attachment. Results from applying the evaluation matrices verify the improvement of JKRW effectiveness and stability in comparison to the other tested models.  The results from applying the Wilcoxon signed-rank method (one of the non-parametric paired tests) indicate that JKRW has significant differences compared to the other models in the different populations at <strong>0.95</strong> confident interval.


2020 ◽  
Vol 38 ◽  
Author(s):  
T. SCHNEIDER ◽  
M.A. RIZZARDI ◽  
S.P. BRAMMER ◽  
S.M. SCHEFFER-BASSO ◽  
A.L. NUNES

ABSTRACT: In view of the rapid evolution of Conyza sumatrensis populations resistant to glyphosate, it is necessary to understand the genetic diversity aimed to improve strategies for managing this weed. We investigated the genetic dissimilarity among 15 biotypes of C. sumatrensis from different geographic regions using microsatellite loci. The biotypes, were cultivated in a greenhouse to obtain vegetal material for DNA extraction. Nineteen microsatellite markers (SSR), were developed for C. sumatrensis biotypes. The genetic dissimilarity was estimated by the Jaccard coefficient (JC) and the biotypes grouped by the UPGMA method. The results demonstrated a high dissimilarity (JC = 7.14 to 82.62) of the analyzed material, with the biotypes forming five groups, being one group formed just by the susceptible biotype and in the others grouped by biotypes from distinct locations in the same group The high genetic diversity of C. sumatrensis indicates that the biotypes may show different responses to different management strategies, and that the mechanisms of resistance to herbicides and characteristics of evolution of populations due to adaptability may be some of the factors involved in the genetic variability of the species.


2013 ◽  
Vol 43 (6) ◽  
pp. 978-984 ◽  
Author(s):  
Vanice Dias Oliveira ◽  
Allivia Rouse Carregosa Rabbani ◽  
Ana Veruska Cruz da Silva ◽  
Ana da Silva Lédo

This research had as objective to characterize genetically individuals of physic nut cultivated in experimental areas in Sergipe, Brazil by means of RAPD molecular markers. Leaves of 40 individuals were collected and DNA was isolated using CTAB 2% method. Were used 30 primers RAPD for DNA amplification, and this data was used to estimate the genetic similarity among the pairs of individuals, using Jaccard coefficient, and group them out for the UPGMA method. Also, the genetic structure and diversity of the populations were assessed using AMOVA. Of the 100 fragments generated, 29 of were polymorphic. A similarity average of 0.54 among the individuals was found and the amplitude similarities varied from 0.18 to 1.00. One of them (U5) was unit clusters and formed by the most divergent individuals. AMOVA indicated that there is more variation within (63%) the population. In conclusion, it was possible verify genetic variability in physic nut using RAPD markers at these experimental areas.


Author(s):  
T. V. Kuzmina ◽  
E. Iu. Toropova

The aim of the study was to determine the influence of plant species and year conditions on the biological diversity and number of insects – inhabitants of the crown layer of woody plants of the Rosaceae family in the conditions of the northern forest-steppe of the Ob region. The research was carried out in 2017–2018. During the flowering period of woody entomophilous plants, insects were collected by mowing with an entomological net in the crowns and undercrown space (25 strokes in four repetitions). In the crown of woody introduced plants Pyrus ussuriensis (Ussuri pear), Prunus maackii (Maak plum), Amelanchier alnifolia (alder irga), Spiraea betulifolia (spiraea birch leaf), Physocarpus opulifolius growing on the territory of arboretum of RAS in the northern forest-steppe of the Ob region, a total of 2597 insect specimens from 7 orders and more than 30 families were found. The largest number of insects belonged to the order Diptera (49.4 %). Representatives of the orders Thysanoptera (23.7%) and Hymenoptera (11.4%) made a significant contribution to the formation of the entomofauna. The entomofauna of different species of woody plants from the Rosaceae family differed in the taxonomic groups of insects and their numbers. A high degree of enomofauna similarity (Jaccard coefficient is 0.75) was found between Amelanchier alnifolia and Pyrus ussuriensis with similar flowering periods. A low degree of similarity was found between Amelanchier alnifolia and Spiraea betulifolia (0.32) and between Pyrus ussuriensis and Physocarpus opulifolius (0.33). The species of the plant (38.1 and 26.1%, respectively) had the greatest influence on the biological diversity and the number of entomocomplexes, which indicates the adaptation of insects to a woody plant of the Rosaceae family. The conditions of the year significantly (by 9.8%) influenced the biological diversity of insects during the flowering period.


Author(s):  
Bharathi Garimella ◽  
G. V. S. N. R. V. Prasad ◽  
M. H. M. Krishna Prasad

The churn prediction based on telecom data has been paid great attention because of the increasing the number telecom providers, but due to inconsistent data, sparsity, and hugeness, the churn prediction becomes complicated and challenging. Hence, an effective and optimal prediction of churns mechanism, named adaptive firefly-spider optimization (adaptive FSO) algorithm, is proposed in this research to predict the churns using the telecom data. The proposed churn prediction method uses telecom data, which is the trending domain of research in predicting the churns; hence, the classification accuracy is increased. However, the proposed adaptive FSO algorithm is designed by integrating the spider monkey optimization (SMO), firefly optimization algorithm (FA), and the adaptive concept. The input data is initially given to the master node of the spark framework. The feature selection is carried out using Kendall’s correlation to select the appropriate features for further processing. Then, the selected unique features are given to the master node to perform churn prediction. Here, the churn prediction is made using a deep convolutional neural network (DCNN), which is trained by the proposed adaptive FSO algorithm. Moreover, the developed model obtained better performance using the metrics, like dice coefficient, accuracy, and Jaccard coefficient by varying the training data percentage and selected features. Thus, the proposed adaptive FSO-based DCNN showed improved results with a dice coefficient of 99.76%, accuracy of 98.65%, Jaccard coefficient of 99.52%.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Zhen Zhang ◽  
Jin Du ◽  
Qingchun Meng ◽  
Xiaoxia Rong ◽  
Xiaodan Fan

With the growth of online commerce, companies have created virtual communities (VCs) where users can create posts and reply to posts about the company’s products. VCs can be represented as networks, with users as nodes and relationships between users as edges. Information propagates through edges. In VC studies, it is important to know how the number of topics concerning the product grows over time and what network features make a user more influential than others in the information-spreading process. The existing literature has not provided a quantitative method with which to determine key points during the topic emergence process. Also, few researchers have considered the link between multilayer physical features and the nodes’ spreading influence. In this paper, we present two new ideas to enrich network theory as applied to VCs: a novel application of an adjusted coefficient of determination to topic growth and an adjustment to the Jaccard coefficient to measure the connection between two users. A two-layer network model was first used to study the spread of topics through a VC. A random forest method was then applied to rank various factors that might determine an individual user’s importance in topic spreading through a VC. Our research provides insightful ways for enterprises to mine information from VCs.


2013 ◽  
Vol 13 ◽  
pp. 40-46
Author(s):  
Dipak Khanal ◽  
Yubak Dhoj GC ◽  
Marc Sporleder ◽  
Resham B Thapa

A survey was conducted to study the abundance and distribution of white grubs in three districts representing different ecological domines in the country during June-July 2010. Two light traps were installed for two nights in two locations each of Makawanpur, Tanahu and Chitwan districts, and a season long light trap was installed at Mangalpur of Chitwan district from April to September 2010 for assessing scarab beetles flight activity. The ‘simple matching coefficient' revealed high similarity >70% between two sites in each of the districts, while a similarity of 29-50% was observed between sites of different districts. The Jaccard coefficient revealed the same trend. However, coefficients were much lower, above 40% when comparing sites within a district, and below 20% when compared sites among the districts. The dominant species in Chitwan were Anomala dimidiata Hope (24%) followed by Maladera affinis Blanchard (23.75%), Anomala varicolor (Gyllenhal) Rutelinae (23%), Heteronychus lioderus Redtenbacher (14%) and Holotrichia sp (7%). The flight activity and species composition of scarab beetles in the three districts appeared to be different. The Journal of Agriculture and Environment Vol:13, Jun.2012, Page 40-46 DOI: http://dx.doi.org/10.3126/aej.v13i0.7586


Sign in / Sign up

Export Citation Format

Share Document