OLAP on multidimensional text databases: Topic network cube and its applications

Zhiyuan Zhang; Hong Wang; Xingjie Feng

doi:10.2298/fil1805973z

OLAP on multidimensional text databases: Topic network cube and its applications

Filomat ◽

10.2298/fil1805973z ◽

2018 ◽

Vol 32 (5) ◽

pp. 1973-1982

Author(s):

Zhiyuan Zhang ◽

Hong Wang ◽

Xingjie Feng

Keyword(s):

Gibbs Sampling ◽

Numerical Data ◽

Experimental Results ◽

Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining

Applied Sciences ◽

10.3390/app7111141 ◽

2017 ◽

Vol 7 (11) ◽

pp. 1141 ◽

Cited By ~ 7

Author(s):

JeeHee Lee ◽

June-Seong Yi

Keyword(s):

Text Mining ◽

Numerical Data ◽

Text Data ◽

Unstructured Text ◽

Bidding Process

Download Full-text

Analysis of unstructured text data for a person social profile

Proceedings of the Internationsl Conference on Electronic Governance and Open Society Challenges in Eurasia - eGose '17 ◽

10.1145/3129757.3129758 ◽

2017 ◽

Cited By ~ 1

Author(s):

Alexey Y. Timonin ◽

Alexander S. Bozhday ◽

Alexander M. Bershadsky

Keyword(s):

Text Data ◽

Unstructured Text

Download Full-text

Incorporating Text OLAP in Business Intelligence

Business Intelligence Applications and the Web - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-61350-038-5.ch004 ◽

2011 ◽

pp. 77-101 ◽

Cited By ~ 1

Author(s):

Byung-Kwon Park ◽

Il-Yeol Song

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Business Intelligence ◽

Multidimensional Analysis ◽

Web Pages ◽

Data Types ◽

Text Documents ◽

Text Data ◽

Platform Architecture ◽

Unstructured Text

As the amount of data grows very fast inside and outside of an enterprise, it is getting important to seamlessly analyze both data types for total business intelligence. The data can be classified into two categories: structured and unstructured. For getting total business intelligence, it is important to seamlessly analyze both of them. Especially, as most of business data are unstructured text documents, including the Web pages in Internet, we need a Text OLAP solution to perform multidimensional analysis of text documents in the same way as structured relational data. We first survey the representative works selected for demonstrating how the technologies of text mining and information retrieval can be applied for multidimensional analysis of text documents, because they are major technologies handling text data. And then, we survey the representative works selected for demonstrating how we can associate and consolidate both unstructured text documents and structured relation data for obtaining total business intelligence. Finally, we present a future business intelligence platform architecture as well as related research topics. We expect the proposed total heterogeneous business intelligence architecture, which integrates information retrieval, text mining, and information extraction technologies all together, including relational OLAP technologies, would make a better platform toward total business intelligence.

Download Full-text

Text-Driven Reasoning and Multi-Structured Data Analytics for Business Intelligence

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch001 ◽

2016 ◽

pp. 1-32 ◽

Cited By ~ 1

Author(s):

Lipika Dey ◽

Ishan Verma

Keyword(s):

Business Intelligence ◽

Data Analytics ◽

Statistical Significance ◽

Heterogeneous Data ◽

Structured Data ◽

Text Data ◽

Business Operations ◽

Unstructured Text ◽

Business Data ◽

Heterogeneous Resources

Business Intelligence (BI) refers to an organization's capability to gather and analyze data about business operations and transactions in order to evaluate its performance. The abundance of information both within the enterprise and outside of it has necessitated a change in traditional Business Intelligence practices. There is a need to exploit heterogeneous resources. Text data like news, analyst reports, etc. helps in better interpretation of business data. In this chapter, the authors present a futuristic BI framework that facilitates acquisition, indexing, and analysis of heterogeneous data for extracting business intelligence. It enables integration of unstructured text data and structured business data seamlessly to generate insights. The authors propose methods that can help in extraction of events or significant happenings from both unstructured and structured data, correlate the events, and thereafter reason to generate insights. The insights extracted could be validated as cause-effect pairs based on the statistical significance of co-occurrence of events.

Download Full-text

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Mathematical Problems in Engineering ◽

10.1155/2020/5143797 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Ziqi Jia ◽

Ling Song

Keyword(s):

Categorical Data ◽

Clustering Algorithm ◽

Numerical Data ◽

Experimental Results ◽

Cluster Center ◽

Real Dataset ◽

Dissimilarity Coefficient ◽

Initial Cluster ◽

Data Objects ◽

Selection Of

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.

Download Full-text

Discovering Business Processes in CRM Systems by Leveraging Unstructured Text Data

2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc/smartcity/dss.2018.00257 ◽

2018 ◽

Cited By ~ 2

Author(s):

Rolf B. Banziger ◽

Artie Basukoski ◽

Thierry Chaussalet

Keyword(s):

Business Processes ◽

Text Data ◽

Unstructured Text

Download Full-text

Modelling the sensory space of varietal wines: Mining of large, unstructured text data and visualisation of style patterns

Scientific Reports ◽

10.1038/s41598-018-23347-w ◽

2018 ◽

Vol 8 (1) ◽

Author(s):

Carlo C. Valente ◽

Florian F. Bauer ◽

Fritz Venter ◽

Bruce Watson ◽

Hélène H. Nieuwoudt

Keyword(s):

Text Data ◽

Unstructured Text ◽

Sensory Space

Download Full-text

A graph construction study for graph-based semi-supervised learning: Case study on unstructured text data

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006465 ◽

2019 ◽

Author(s):

Sumedh Yadav ◽

Gautam Kumar ◽

Shivam Kumar

Keyword(s):

Supervised Learning ◽

Text Data ◽

Unstructured Text

Download Full-text

Effective processing of unstructured data using python in Hadoop map reduce

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12456 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 417

Author(s):

K Kousalya ◽

Shaik Javed Parvez

Keyword(s):

Open Source ◽

Unstructured Data ◽

Map Reduce ◽

Text Data ◽

Apache Hadoop ◽

Unstructured Text ◽

Wide Range ◽

Two Stages

In present scenario, the growing data are naturally unstructured. In this case to handle the wide range of data, is difficult. The proposed paper is to process the unstructured text data effectively in Hadoop map reduce using Python. Apache Hadoop is an open source platform and it widely uses Map Reduce framework. Map Reduce is popular and effective for processing the unstructured data in parallel manner. There are two stages in map reduce, namely transform and repository. Here the input splits into small blocks and worker node process individual blocks in parallel. This map reduce generally is based on java. While Hadoop Streaming allows writing mapper and reducer in other languages like Python. In this paper, we are going to show an alternative way of processing the growing unstructured content data by using python. We will also compare the performance between java based and non-java based programs.

Download Full-text

TLabel

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2016100103 ◽

2016 ◽

Vol 12 (4) ◽

pp. 54-74 ◽

Cited By ~ 1

Author(s):

Lamia Oukid ◽

Omar Boussaid ◽

Nadjia Benblidia ◽

Fadila Bentayeb

Keyword(s):

Text Categorization ◽

Numerical Data ◽

Semantic Content ◽

Aggregation Operators ◽

Structured Data ◽

Wide Range ◽

Textual Data ◽

On Line ◽

Analytical Processing ◽

On Line Analytical Processing

Data Warehousing technologies and On-Line Analytical Processing (OLAP) feature a wide range of techniques for the analysis of structured data. However, these techniques are inadequate when it comes to analyzing textual data. Indeed, classical aggregation operators have earned their spurs in the online analysis of numerical data, but are unsuitable for the analysis of textual data. To alleviate this shortcoming, on-line analytical processing in text cubes requires new analysis operators adapted to textual data. In this paper, the authors propose a new aggregation operator named Text Label (TLabel), based on text categorization. Their operator aggregates textual data in several classes of documents. Each class is associated with a label that represents the semantic content of the textual data of the class. TLabel is founded on a tailoring of text mining techniques to OLAP. To validate their operator, the authors perform an experimental study and the preliminary results show the interest of their approach for Text OLAP.

Download Full-text