scholarly journals Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining

2017 ◽  
Vol 7 (11) ◽  
pp. 1141 ◽  
Author(s):  
JeeHee Lee ◽  
June-Seong Yi
Author(s):  
Byung-Kwon Park ◽  
Il-Yeol Song

As the amount of data grows very fast inside and outside of an enterprise, it is getting important to seamlessly analyze both data types for total business intelligence. The data can be classified into two categories: structured and unstructured. For getting total business intelligence, it is important to seamlessly analyze both of them. Especially, as most of business data are unstructured text documents, including the Web pages in Internet, we need a Text OLAP solution to perform multidimensional analysis of text documents in the same way as structured relational data. We first survey the representative works selected for demonstrating how the technologies of text mining and information retrieval can be applied for multidimensional analysis of text documents, because they are major technologies handling text data. And then, we survey the representative works selected for demonstrating how we can associate and consolidate both unstructured text documents and structured relation data for obtaining total business intelligence. Finally, we present a future business intelligence platform architecture as well as related research topics. We expect the proposed total heterogeneous business intelligence architecture, which integrates information retrieval, text mining, and information extraction technologies all together, including relational OLAP technologies, would make a better platform toward total business intelligence.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1973-1982
Author(s):  
Zhiyuan Zhang ◽  
Hong Wang ◽  
Xingjie Feng

Multidimensional text data contains both structured attributes and unstructured text. Unlike the traditional numerical data, it is not straightforward to apply online analytical processing on multidimensional text data. Although some OLAP methods such as topic cube have been proposed in order to effectively utilize its structured information and valuable text data, these methods cant tell the relations of topicwords. Considering that topics usually consist of several subtopics and each subtopic usually contains some topic words, we here use a topic network manner, in which related topic words are connected, to express the complex relations of topics. This paper introduces a new concept of topic network cube to perform OLAP analysis on multidimensional text databases. Firstly, we propose a method called GL-LDA based on Gibbs sampling outputs of Labeled LDA to measure the relations between topic words. Secondly, we give a storagemodel of topic network cubewhich can efficiently generate topic network using GL-LDA. Thirdly, we show how to perform OLAP analysis on topic network cube. Experimental results show that we can analyze multidimensional text databases in different granularity easily and effectively using just a few simple SQL statements, and the output network provides rich and useful information of topics.


2016 ◽  
pp. 1-32 ◽  
Author(s):  
Lipika Dey ◽  
Ishan Verma

Business Intelligence (BI) refers to an organization's capability to gather and analyze data about business operations and transactions in order to evaluate its performance. The abundance of information both within the enterprise and outside of it has necessitated a change in traditional Business Intelligence practices. There is a need to exploit heterogeneous resources. Text data like news, analyst reports, etc. helps in better interpretation of business data. In this chapter, the authors present a futuristic BI framework that facilitates acquisition, indexing, and analysis of heterogeneous data for extracting business intelligence. It enables integration of unstructured text data and structured business data seamlessly to generate insights. The authors propose methods that can help in extraction of events or significant happenings from both unstructured and structured data, correlate the events, and thereafter reason to generate insights. The insights extracted could be validated as cause-effect pairs based on the statistical significance of co-occurrence of events.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Carlo C. Valente ◽  
Florian F. Bauer ◽  
Fritz Venter ◽  
Bruce Watson ◽  
Hélène H. Nieuwoudt

2018 ◽  
Vol 7 (2.21) ◽  
pp. 417
Author(s):  
K Kousalya ◽  
Shaik Javed Parvez

In present scenario, the growing data are naturally unstructured. In this case to handle the wide range of data, is difficult. The proposed paper is to process the unstructured text data effectively in Hadoop map reduce using Python. Apache Hadoop is an open source platform and it widely uses Map Reduce framework. Map Reduce is popular and effective for processing the unstructured data in parallel manner.  There are two stages in map reduce, namely transform and repository. Here the input splits into small blocks and worker node process individual blocks in parallel. This map reduce generally is based on java. While Hadoop Streaming allows writing mapper and reducer in other languages like Python. In this paper, we are going to show an alternative way of processing the growing unstructured content data by using python. We will also compare the performance between java based and non-java based programs. 


2020 ◽  
Vol 14 (5) ◽  
pp. 779-790
Author(s):  
Ruriko Watanabe ◽  
Nobutada Fujii ◽  
Daisuke Kokuryo ◽  
Toshiya Kaihara ◽  
Yoichi Abe ◽  
...  

This study was conducted to devise a method for supporting consulting service companies in their response to client demands irrespective of the expertise of consultants. With emphasis on revitalization of small and medium-sized enterprises, the importance of support systems for consulting services to serve them is increasing. Those systems must support solutions to difficulties that must be addressed by enterprises. Consulting companies can respond to widely various management consultations. Nevertheless, because the consultation contents are highly specialized, service proposals and problem detection depend on the experience and intuition of the consultant. Often, stable service cannot be provided. A support system must provide stable services independent of the ability of consultants. In this study, analyzing customer information describing the contents of consultation with client companies is the first step in constructing a support system that can predict future problems. Text data such as a consultant’s visit history, consultation contents by e-mail, and contents of call centers are used for analyses because the contents can explain current problems. They might also indicate future problems. This report describes a method to analyze text data using text mining. The target problem is fraud, which includes uncertainty: cases in which it is not clear whether a fraud problem has occurred with the company. To address uncertainty, a method of using logistic regression models is proposed to represent inferred values as probabilities, rather than as binary discriminated data, because the possibility exists that some misidentified companies might have some difficulty. As described herein, computer experiments are conducted to verify the effectiveness of the proposed method and to compare consultants’ forecasted and achieved results. Results of a verification experiment are presented in the following. First, the proposed method is applicable to problems including uncertainties. Secondly, the possibility exists of discovering companies with a fraud problem of which they are unaware.


Sign in / Sign up

Export Citation Format

Share Document