Analyses on Text Data Related to the Safety of Drug Use Based on Text Mining Techniques

As the amount of data grows very fast inside and outside of an enterprise, it is getting important to seamlessly analyze both data types for total business intelligence. The data can be classified into two categories: structured and unstructured. For getting total business intelligence, it is important to seamlessly analyze both of them. Especially, as most of business data are unstructured text documents, including the Web pages in Internet, we need a Text OLAP solution to perform multidimensional analysis of text documents in the same way as structured relational data. We first survey the representative works selected for demonstrating how the technologies of text mining and information retrieval can be applied for multidimensional analysis of text documents, because they are major technologies handling text data. And then, we survey the representative works selected for demonstrating how we can associate and consolidate both unstructured text documents and structured relation data for obtaining total business intelligence. Finally, we present a future business intelligence platform architecture as well as related research topics. We expect the proposed total heterogeneous business intelligence architecture, which integrates information retrieval, text mining, and information extraction technologies all together, including relational OLAP technologies, would make a better platform toward total business intelligence.

Download Full-text

Text Mining to Support Consulting Services for Client Company State Recognition

International Journal of Automation Technology ◽

10.20965/ijat.2020.p0779 ◽

2020 ◽

Vol 14 (5) ◽

pp. 779-790

Author(s):

Ruriko Watanabe ◽

Nobutada Fujii ◽

Daisuke Kokuryo ◽

Toshiya Kaihara ◽

Yoichi Abe ◽

...

Keyword(s):

Text Mining ◽

Support System ◽

Computer Experiments ◽

Text Data ◽

Logistic Regression Models ◽

Service Companies ◽

Consulting Services ◽

Customer Information ◽

Problem Detection ◽

Specialized Service

This study was conducted to devise a method for supporting consulting service companies in their response to client demands irrespective of the expertise of consultants. With emphasis on revitalization of small and medium-sized enterprises, the importance of support systems for consulting services to serve them is increasing. Those systems must support solutions to difficulties that must be addressed by enterprises. Consulting companies can respond to widely various management consultations. Nevertheless, because the consultation contents are highly specialized, service proposals and problem detection depend on the experience and intuition of the consultant. Often, stable service cannot be provided. A support system must provide stable services independent of the ability of consultants. In this study, analyzing customer information describing the contents of consultation with client companies is the first step in constructing a support system that can predict future problems. Text data such as a consultant’s visit history, consultation contents by e-mail, and contents of call centers are used for analyses because the contents can explain current problems. They might also indicate future problems. This report describes a method to analyze text data using text mining. The target problem is fraud, which includes uncertainty: cases in which it is not clear whether a fraud problem has occurred with the company. To address uncertainty, a method of using logistic regression models is proposed to represent inferred values as probabilities, rather than as binary discriminated data, because the possibility exists that some misidentified companies might have some difficulty. As described herein, computer experiments are conducted to verify the effectiveness of the proposed method and to compare consultants’ forecasted and achieved results. Results of a verification experiment are presented in the following. First, the proposed method is applicable to problems including uncertainties. Secondly, the possibility exists of discovering companies with a fraud problem of which they are unaware.

Download Full-text

Dual Scaling in Data Mining from Text Databases

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2006.p0451 ◽

2006 ◽

Vol 10 (4) ◽

pp. 451-457 ◽

Cited By ~ 3

Author(s):

Junzo Watada ◽

◽

Keisuke Aoki ◽

Masahiro Kawano ◽

Muhammad Suzuri Hitam ◽

...

Keyword(s):

Multivariate Analysis ◽

Text Mining ◽

Kansei Engineering ◽

Semantic Meaning ◽

Dual Scaling ◽

Text Documents ◽

Text Data ◽

Text Document ◽

Text Information ◽

Quantification Model

The availability of multimedia text document information has disseminated text mining among researchers. Text documents, integrate numerical and linguistic data, making text mining interesting and challenging. We propose text mining based on a fuzzy quantification model and fuzzy thesaurus. In text mining, we focus on: 1) Sentences included in Japanese text that are broken down into words. 2) Fuzzy thesaurus for finding words matching keywords in text. 3) Fuzzy multivariate analysis to analyze semantic meaning in predefined case studies. We use a fuzzy thesaurus to translate words using Chinese and Japanese characters into keywords. This speeds up processing without requiring a dictionary to separate words. Fuzzy multivariate analysis is used to analyze such processed data and to extract latent mutual related structures in text data, i.e., to extract otherwise obscured knowledge. We apply dual scaling to mining library and Web page text information, and propose integrating the result in Kansei engineering for possible application in sales, marketing, and production.

Download Full-text

The Method to Analyze Freely Described Data from Questionnaires

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2009.p0268 ◽

2009 ◽

Vol 13 (3) ◽

pp. 268-274 ◽

Cited By ~ 3

Author(s):

Masaomi Kimura ◽

Keyword(s):

Text Mining ◽

Call Center ◽

Clustering Algorithms ◽

Web Pages ◽

Research Papers ◽

Text Data ◽

Textual Data ◽

The Way ◽

Newspaper Articles

Text mining has been growing; mainly due to the need to extract useful information from vast amounts of textual data. Our target here is text data, a collection of freely described data from questionnaires. Unlike research papers, newspaper articles, call-center logs and web pages, which are usually the targets of text mining analysis, the freely described data contained in the questionnaire responses have specific characteristics, including a small number of short sentences forming individual pieces of data, while the wide variety of content precludes the applications of clustering algorithms used to classify the same. In this paper, we suggest the way to extract the opinions which are delivered by multiple respondents, based on the modification relationships included in each sentence in the freely described data. Certain applications of our method are also presented after the introduction of our approach.

Download Full-text

Text Mining Data from Students to Reveal Meaningful Information for Educators

Studies in Business and Economics ◽

10.29117/sbe.2021.0125 ◽

2021 ◽

Vol 24 (1) ◽

pp. 5-30

Author(s):

Zainab M. AlQenaei ◽

David E. Monarchi

Keyword(s):

Text Mining ◽

Learning Strategies ◽

Undergraduate Students ◽

Text Analysis ◽

Past Research ◽

Text Data ◽

Final Grade ◽

Meaningful Information ◽

Novel Approach ◽

Academic Profiles

Academic institutions adopt different advising tools for various objectives. Past research used both numeric and text data to predict students’ performance. Moreover, numerous research projects have been conducted to find different learning strategies and profiles of students. Those strategies of learning together with academic profiles assisted in the advising process. This research proposes an approach to supplement these activities by text mining students’ essays to better understand different students’ profiles across different courses (subjects). Text analysis was performed on 99 essays written by undergraduate students in three different courses. The essays and terms were projected in a 20-dimensional vector space. The 20 dimensions were used as independent variables in a regression analysis to predict a student’s final grade in a course. Further analyses were performed on the dimensions found statistically significant. This study is a preliminary analysis to demonstrate a novel approach of extracting meaningful information by text mining essays written by students to develop an advising tool that can be used by educators.

Download Full-text

A Study on the Emotional Analysis of Abandoned Surrogacy Events Based on Text Mining

E3S Web of Conferences ◽

10.1051/e3sconf/202129002034 ◽

2021 ◽

Vol 290 ◽

pp. 02034

Author(s):

Guanlan Liang ◽

Xunbing Shen

Keyword(s):

Text Mining ◽

Web Crawler ◽

Text Data ◽

The Web

In late January 2021, news that actress Zheng Shuang had surrogacy abroad and had wanted to give up her children sparked a public outcry. This paper takes Zheng Shuang’s comments on the topic of surrogacy and her abandonment as the research object. Firstly, the web crawler technology is used to grab and mine the comment text, and then the ROSTCM software is used to analyze the text data to explore the comment topics of Weibo network users after the abandonment event and the analysis of their emotional tendencies to the event.

Download Full-text

A Novel Metric to Quantify the Effect of Pathway Enrichment Evaluation With Respect to Biomedical Text-Mined Terms: Development and Feasibility Study (Preprint)

10.2196/preprints.28247 ◽

2021 ◽

Author(s):

Xuan Qin ◽

Xinzhi Yao ◽

Jingbo Xia

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Drug Use ◽

Natural Language ◽

Language Processing ◽

Entity Recognition ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Related Gene ◽

Pathway Enrichment

BACKGROUND Natural language processing has long been applied in various applications for biomedical knowledge inference and discovery. Enrichment analysis based on named entity recognition is a classic application for inferring enriched associations in terms of specific biomedical entities such as gene, chemical, and mutation. OBJECTIVE The aim of this study was to investigate the effect of pathway enrichment evaluation with respect to biomedical text-mining results and to develop a novel metric to quantify the effect. METHODS Four biomedical text mining methods were selected to represent natural language processing methods on drug-related gene mining. Subsequently, a pathway enrichment experiment was performed by using the mined genes, and a series of inverse pathway frequency (IPF) metrics was proposed accordingly to evaluate the effect of pathway enrichment. Thereafter, 7 IPF metrics and traditional <i>P</i> value metrics were compared in simulation experiments to test the robustness of the proposed metrics. RESULTS IPF metrics were evaluated in a case study of rapamycin-related gene set. By applying the best IPF metrics in a pathway enrichment simulation test, a novel discovery of drug efficacy of rapamycin for breast cancer was replicated from the data chosen prior to the year 2000. Our findings show the effectiveness of the best IPF metric in support of knowledge discovery in new drug use. Further, the mechanism underlying the drug-disease association was visualized by Cytoscape. CONCLUSIONS The results of this study suggest the effectiveness of the proposed IPF metrics in pathway enrichment evaluation as well as its application in drug use discovery.

Download Full-text

Text Complexity Classification Data Mining Model Based on Dynamic Quantitative Relationship between Modality and English Context

Mathematical Problems in Engineering ◽

10.1155/2021/4805537 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Dan Zhang

Keyword(s):

Machine Learning ◽

Text Mining ◽

Complexity Analysis ◽

Quantitative Relationship ◽

Mobile Internet ◽

Unstructured Data ◽

Theoretical Research ◽

Text Complexity ◽

Text Data ◽

Mining Technology

With the rapid development of mobile internet technology, there are a large number of unstructured data in dynamic data, such as text data, multimedia data, etc., so it is essential to analyze and process these unstructured data to obtain potentially valuable information. This article first starts with the theoretical research of text complexity analysis and analyzes the source of text complexity and its five characteristics of dynamic, complexity, concealment, sentiment, and ambiguity, combined with the expression of user needs in the network environment. Secondly, based on the specific process of text mining, namely, data collection, data processing, and data visualization, it is proposed to subdivide the user demand analysis into three stages of text complexity acquisition, recognition, and expression, to obtain a text complexity analysis based on text mining technology. After that, based on computational linguistics and mathematical-statistical analysis, combined with machine learning and information retrieval technology, the text in any format is converted into a content format that can be used for machine learning, and patterns or knowledge are derived from this content format. Then, through the comparison and research of text mining technology, combined with the text complexity analysis hierarchical structure model, a quantitative relationship complexity analysis framework based on text mining technology is proposed, which is embodied in the use of web crawler technology. Experimental results show that the collected quantitative relationship information is identified and expressed in order to realize the conversion of quantitative relationship information into product features. The market data and text data can be integrated to help improve the model performance and the use of text data can further improve predictions for accuracy.

Download Full-text

English poems categorization using text mining and rough set theory

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i4.1898 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1701-1710

Author(s):

Saif Ali Alsaidi ◽

Ahmed T. Sadeq ◽

Hasanen S. Abdullah

Keyword(s):

Text Mining ◽

Set Theory ◽

Rough Set ◽

Text Categorization ◽

Rough Set Theory ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Text Data ◽

Data Set ◽

Proposed Model

In recent years, Text Mining wasan important topic because of the growth of digital text data from many sources such as government document, Email, Social Media, Website, etc. The English poemsare one of the text data to categorization English Poems will use Text categorization, Text categorization is a method in which classify documents into one or more categories that were predefined the category based on the text content in a document .In this paper we will solve the problem of how to categorize the English poem into one of the English Poems categorizations by using text mining technique and Machine learning algorithm, Our data set consist of seven categorizations for poems the data set is divided into two-part training (learning)and testing data. In the proposed model we apply the text preprocessing for the documents file to reduce the number of feature and reduce dimensionality the preprocessing process converts the text poem to features and remove the irrelevant feature by using text mining process (tokenize,remove stop word and stemming), to reduce the feature vector of the remaining feature we usetwo methods for feature selection and use Rough set theory as machine learning algorithm to perform the categorization, and we get 88% success classification of the proposed model.

Download Full-text

CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

10.31235/osf.io/tyjr7 ◽

2021 ◽

Author(s):

Fei Shen ◽

Wenting Yu ◽

Chen Min ◽

Qianying Ye ◽

Chuanli Xia ◽

...

Keyword(s):

Social Media ◽

Text Mining ◽

Word Segmentation ◽

Unstructured Data ◽

Text Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Text Data ◽

Social Media Text

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

Download Full-text