T Structured Semantic Weight Relationship Algorithm Combined with Decision Trees for Data Extraction

2019 ◽  
Vol 16 (2) ◽  
pp. 735-739 ◽  
Author(s):  
Saravana C. S. Kumar ◽  
P. Amudhavalli ◽  
R. Santhosh ◽  
C. Kalaiarasan

In Semantic Web Mining extracting the relevant data from web is the main objective. With the increase in the complexity of the data involved the focus is to achieve better accuracy by extracting the required information. In Semantic web mining the primary input is the user query but the accuracy is based on the domain classification of the data extracted. To achieve accuracy higher a new approach has been proposed where the algorithm same name as article titled maintains a Semantic structure combined with the decision trees. From the training set each word has been tokenized and the relationship between them has been established involving cosine similarity weight as well as to the possibility terms of the same word. Cosine Similarity is calculated not only between the words but also established between the group of the training sentences. The paper explains in detail regarding grouping of training sentences and establishing the weight between them using our proposed approach.

2020 ◽  
Vol 19 (4) ◽  
pp. 493-519
Author(s):  
Yaroslav D. Sovetkin ◽  

Managerial innovations have become the topic of interest for many scholars, but this concept remains underdeveloped and poorly managed among the academy and business community in Russia. This paper offers the composition of approach to definition and classifi cation of managerial innovations, formed on the basis of exploration of the concept “managerial innovation” evolution, and estimation of the relationship with a more general concept “innovation”. The suggested composition of approach is based on the three-stage bibliographic analysis of scientific literature. In course of the bibliographic research, scientific articles were selected according to the key words, period of publication and citation index. 140 scientific publications were identified and collected for the period from 1975 to 2019 covering citation indexes from 0 to 12 476 by Web of Science citation database and from 4 to 2 185 by Scopus database. On the basis of the conducted bibliographic research, the author introduces his definition of innovation and managerial innovation and explains the connection between them. Within the conducted research different approaches to classification of managerial innovations were studied and on their basis a new approach to classification of managerial innovations was proposed. The findings can be useful for different avenues of further research regarding managerial innovations.


Author(s):  
B. Umamageswari ◽  
R. Kalpana

Web mining is done on huge amounts of data extracted from WWW. Many researchers have developed several state-of-the-art approaches for web data extraction. So far in the literature, the focus is mainly on the techniques used for data region extraction. Applications which are fed with the extracted data, require fetching data spread across multiple web pages which should be crawled automatically. For this to happen, we need to extract not only data regions, but also the navigation links. Data extraction techniques are designed for specific HTML tags; which questions their universal applicability for carrying out information extraction from differently formatted web pages. This chapter focuses on various web data extraction techniques available for different kinds of data rich pages, classification of web data extraction techniques and comparison of those techniques across many useful dimensions.


The Dark Web ◽  
2018 ◽  
pp. 199-226 ◽  
Author(s):  
B. Umamageswari ◽  
R. Kalpana

Web mining is done on huge amounts of data extracted from WWW. Many researchers have developed several state-of-the-art approaches for web data extraction. So far in the literature, the focus is mainly on the techniques used for data region extraction. Applications which are fed with the extracted data, require fetching data spread across multiple web pages which should be crawled automatically. For this to happen, we need to extract not only data regions, but also the navigation links. Data extraction techniques are designed for specific HTML tags; which questions their universal applicability for carrying out information extraction from differently formatted web pages. This chapter focuses on various web data extraction techniques available for different kinds of data rich pages, classification of web data extraction techniques and comparison of those techniques across many useful dimensions.


2020 ◽  
Vol 13 (4) ◽  
pp. 588-594
Author(s):  
Saravana Kumar Coimbatore Shanmugam ◽  
Santhosh Rajendran ◽  
Amudhavalli Padmanabhan ◽  
Kalaiarasan Chellan

Background: Increase in the internet data has increased the priority in the data extraction accuracy. Accuracy here lies with what data the user has requested for and what has been retrieved. The same large data sets that need to be analyzed make the required information retrieval a challenging task. Objective: To propose a new algorithm in an improved way than the traditional methods to classify the category or group to which each training sentence belongs. Method: Identifying the category to which the input sentence belongs is achieved by analyzing the Noun and Verb of each training sentence. NLP is applied to each training sentence and the group or category classification is achieved using the proposed GENI algorithm so that the classifier is trained efficiently to extract the user requested information. Results: The input sentences are transformed into a data table by applying GENI algorithm for group categorization. Plotting the graph in R tool, the accuracy of the group extracted by the Classifier involving GENI approach is higher than that of Naive Bayes & Decision Trees. Conclusion: It remains a challenging task to extract the user-requested data, when the user query is complex. Existing techniques are based more on the fixed attributes, and when we move with respect to the fixed attributes, it becomes too complex or impossible for us to determine the common group from the base sentence. Existing techniques are more suitable to a smaller dataset, whereas the proposed GENI algorithm does not hold any restrictions for the Group categorization of larger data sets.


Author(s):  
Sunita ◽  
Gurvinder Singh ◽  
Vijay Rana

: Pattern mining is the mechanism of extracting useful information from the large dataset of information. There is the sub-field of web mining, which is sequential growth, incremental growth of the Internet. The purposed works consist of an analysis of techniques used to extract useful URLs to replace noisy data. Noisy data extraction from user query is considered along with redundancy handling. This redundancy handling mechanism employed in the existing literature is known as ambiguity handling. The clustering mechanism employed in the existing system includes k means and semantic search. These mechanisms are static causing performance degradation in terms of execution time. The performance improvement mechanism is suggested in this literature. The methods MPV (Most-Probable-Values) clustering and N-gram techniques for improvement considered in existing literature can further be improved using the research methodology specified through this literature. In the proposed system results are based on MPV clustering with N-grams techniques. N-gram analyzes the instances of a word or phrase across all query data. The parameters fetch the results in the term of execution time and number of URLs retrieves for web page classification.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zheying Zhang ◽  
Manman Lu ◽  
Yu Qin ◽  
Wuji Gao ◽  
Li Tao ◽  
...  

Cancer immunotherapy works by stimulating and strengthening the body’s anti-tumor immune response to eliminate cancer cells. Over the past few decades, immunotherapy has shown remarkable efficacy in the treatment of cancer, particularly the success of immune checkpoint blockade targeting CTLA-4, PD-1 and PDL1, which has led to a breakthrough in tumor immunotherapy. Tumor neoantigens, a new approach to tumor immunotherapy, include antigens produced by tumor viruses integrated into the genome and antigens produced by mutant proteins, which are abundantly expressed only in tumor cells and have strong immunogenicity and tumor heterogeneity. A growing number of studies have highlighted the relationship between neoantigens and T cells’ recognition of cancer cells. Vaccines developed against neoantigens are now being used in clinical trials in various solid tumors. In this review, we summarized the latest advances in the classification of immunotherapy and the process of classification, identification and synthesis of tumor-specific neoantigens, as well as their role in current cancer immunotherapy. Finally, the application prospects and existing problems of neoantigens were discussed.


2015 ◽  
Vol 27 (6) ◽  
pp. 895-907 ◽  
Author(s):  
Junqiang Su ◽  
Guolian Liu ◽  
Bugao Xu

Purpose – The purpose of this paper is to concentrate on the development of individualized prototype of apparel patterns for young females from 3D body scanning data. Design/methodology/approach – The authors presented a new pattern-making approach that is composed of three major steps: to establish the relationships between body features and corresponding elements in a prototype (e.g. curve or a point); to classify the relationship into grades that provide alternatives to fit a variety of bodies; and to assemble each individual element into a personalized prototype. Findings – The experiment demonstrated that this method could be used for customized prototype development from 3D body scanning in a relatively easy way. Research limitations/implications – Currently, the subjects of this study included only Chinese young females, and the regression models were just suitable for the similar body types though, the research method could be extended to other somatotypes and age groups. Social implications – This approach can be used in the field of made-to-measure, mass customization, and the quick response for apparel pattern making. The technology in this paper facilitates to generate an individualized pattern prototype from 3D body scanning data. Originality/value – Originated from the relationship between the features of a human body and the elements of a pattern prototype, the authors presented a new approach to develop an individualized pattern prototype by classifying the features into grades.


Author(s):  
Peter Adebayo Idowu ◽  
Jeremiah Ademola Balogun

This chapter was developed with a view to present a predictive model for the classification of the level of CD4 count of HIV patients receiving ART/HAART treatment in Nigeria. Following the review of literature, the pre-determining factors for determining CD4 count were identified and validated by experts while historical data explaining the relationship between the factors and CD4 count level was collected. The predictive model for CD4 count level was formulated using C4.5 decision trees (DT), support vector machines (SVM), and the multi-layer perceptron (MLP) classifiers based on the identified factors which were formulated using WEKA software and validated. The results showed that decision trees algorithm revealed five (5) important variables, namely age group, white blood cell count, viral load, time of diagnosing HIV, and age of the patient. The MLP had the best performance with a value of 100% followed by the SVM with an accuracy of 91.1%, and both were observed to outperform the DT algorithm used.


2020 ◽  
Vol 2020 ◽  
pp. 1-8 ◽  
Author(s):  
Wei Long ◽  
Lingna Zhou ◽  
Ying Wang ◽  
Jiaxuan Liu ◽  
Huaiyan Wang ◽  
...  

Purpose. Mutations and phenotypic characteristics remain unclear in patients with congenital hypothyroidism (CH), and no study concerning whether the outcome of transient CH (TCH) or permanent CH (PCH) is determined by mutations has been reported. Methods. We searched the literature up to April 2019. Eligible studies and data extraction were performed. We estimated the relationship between mutations and phenotypic characteristics in pooled patients with CH. Results. Two hundred forty-one cases were pooled from 41 eligible studies. The thyroid morphology, classification of mutated genes, and types of mutations were different between 94 patients with TCH and 147 patients with PCH. Heterozygous missense mutations prevailed in PAX8, TSHR, FOXE1, and NKX2-5, and patients with these mutated genes had a higher risk of PCH (OR = 37.38, 95% CI 5.04–277.21, P<0.001). TCH and PCH have equal shares in patients with mutated DUOX2 or DUOXA2. Dual-site and multisite mutations were frequently detected in DUOX2. High phenotypic heterogeneity was observed in mutated DUOX2 even in the same mutations. However, there was no relationship found between mutations and transient or permanent outcome in patients with mutated DUOX2. Conclusion. Transient or permanent outcomes were influenced by the biological function of mutated genes instead of types of mutations among patients with CH. Patients whose mutations were related to thyroid dysgenesis (TD) were more likely to have PCH. The relationship between mutations and phenotypic characteristics is complicated, and phenotypic characteristics may be affected by mutations and other factors.


Sign in / Sign up

Export Citation Format

Share Document