Scalable business intelligence with graph collections

2016 ◽  
Vol 58 (4) ◽  
Author(s):  
André Petermann ◽  
Martin Junghanns

AbstractUsing graph data models for business intelligence applications is a novel and promising approach. In contrast to traditional data warehouse models, graph models enable the mining of relationship patterns. In our prior work, we introduced an approach to graph-based data integration and analytics called BIIIG (Business Intelligence with Integrated Instance Graphs). In this work, we compare state-of-the-art systems for graph data management and analytics with regard to the support for our approach in Big Data scenarios. To exemplify the analytical value of graph models for business intelligence, we propose an analytical workflow to extract knowledge from graph-integrated business data. Finally, we show how we use Gradoop, a novel framework for distributed graph analytics, to implement our approach.

Transformation presents the second step in the ETL process that is responsible for extracting, transforming and loading data into a data warehouse. The role of transformation is to set up several operations to clean, to format and to unify types and data coming from multiple and different data sources. The goal is to get data to conform to the schema of the data warehouse to avoid any ambiguity problems during the data storage and analytical operations. Transforming data coming from structured, semi-structured and unstructured data sources need two levels of treatments: the first one is transformation schema to schema to get a unified schema for all selected data sources and the second treatment is transformation data to data to unify all types and data gathered. To ensure the setting up of these steps we propose in this paper a process switch from one database schema to another as a part of transformation schema to schema, and a meta-model based on MDA approach to describe the main operations of transformation data to data. The results of our transformations propose a data loading in one of the four schemas of NoSQL to best meet the constraints and requirements of Big Data.


Author(s):  
Deepika Prakash

Three technologies—business intelligence, big data, and machine learning—developed independently and address different types of problems. Data warehouses have been used as systems for business intelligence, and NoSQL databases are used for big data. In this chapter, the authors explore the convergence of business intelligence and big data. Traditionally, a data warehouse is implemented on a ROLAP or MOLAP platform. Whereas MOLAP suffers from having propriety architecture, ROLAP suffers from the inherent disadvantages of RDBMS. In order to mitigate the drawbacks of ROLAP, the authors propose implementing a data warehouse on a NoSQL database. They choose Cassandra as their database. For this they start by identifying a generic information model that captures the requirements of the system to-be. They propose mapping rules that map the components of the information model to the Cassandra data model. They finally show a small implementation using an example.


Author(s):  
Jorge Bernardino ◽  
Joaquim Lapa ◽  
Ana Almeida

A big data warehouse enables the analysis of large amounts of information that typically comes from the organization's transactional systems (OLTP). However, today's data warehouse systems do not have the capacity to handle the massive amount of data that is currently produced. Business intelligence (BI) is a collection of decision support technologies that enable executives, managers, and analysts to make better and faster decisions. Organizations must make good use of business intelligence platforms to quickly acquire desirable information from the huge volume of data to reduce the time and increase the efficiency of decision-making processes. In this chapter, the authors present a comparative analysis of commercial and open source BI tools capabilities, in order to aid organizations in the selection process of the most suitable BI platform. They also evaluated and compared six major open source BI platforms: Actuate, Jaspersoft, Jedox/Palo, Pentaho, SpagoBI, and Vanilla; and six major commercial BI platforms: IBM Cognos, Microsoft BI, MicroStrategy, Oracle BI, SAP BI, and SAS BI & Analytics.


Entity Resolution (ER) is the process of identifying records that refer to the same real-world entity. It plays a key role in many applications as data warehouse, data integration, and business intelligence. Comparing every record with all corresponding records is infeasible especially for a big dataset. To overcome such a problem, blocking techniques have been implemented. In this paper, we propose a novel Efficient Multi-Phase Blocking Strategy (EMPBS) for resolving duplicates in big data. As per our knowledge, some state of art blocking techniques may result in overlapping blocks (i.e. Q-grams) which cause redundant comparisons and hence increase the time complexity. Our proposed blocking strategy has disjoint blocks and less time complexity compared to Q-grams and slandered blocking techniques. In addition, EMPBS is general and requires no restrictions on the type of blocking keys. EMPBS consists of three phases. The first one generates three single efficient blocking keys. The second phase takes the output of the first phase as an input to construct a compound key. The compound key is composed of concatenation of two single blocking keys. Three compound blocking keys are the output of this phase that will be used as an input for the last phase, which is generating the Efficient Multi-Phase Blocking Key (EMPBK). EMPBK is constructed using the union of two compound blocking keys. The implementation of EMPBS presents promising results in terms of Reduction Ratio (RR) as it achieves a higher value of RR than adopting only a single blocking key, while at the same time maintains nearly the same precision and recall. EMPBS reduced about 84% of the average number of comparisons accomplished in a single blocking key. To evaluate EMPBS, we developed a Duplicate Generation tool (DupGen) that accepts a clean semi-structured file as an input and generates labeled duplicate records according to certain criteria.


Author(s):  
Harkiran Kaur ◽  
Kawaljeet Singh ◽  
Tejinder Kaur

Background: Numerous E – Migrants databases assist the migrants to locate their peers in various countries; hence contributing largely in communication of migrants, staying overseas. Presently, these traditional E – Migrants databases face the issues of non – scalability, difficult search mechanisms and burdensome information update routines. Furthermore, analysis of migrants’ profiles in these databases has remained unhandled till date and hence do not generate any knowledge. Objective: To design and develop an efficient and multidimensional knowledge discovery framework for E - Migrants databases. Method: In the proposed technique, results of complex calculations related to most probable On-Line Analytical Processing operations required by end users, are stored in the form of Decision Trees, at the pre- processing stage of data analysis. While browsing the Cube, these pre-computed results are called; thus offering Dynamic Cubing feature to end users at runtime. This data-tuning step reduces the query processing time and increases efficiency of required data warehouse operations. Results: Experiments conducted with Data Warehouse of around 1000 migrants’ profiles confirm the knowledge discovery power of this proposal. Using the proposed methodology, authors have designed a framework efficient enough to incorporate the amendments made in the E – Migrants Data Warehouse systems on regular intervals, which was totally missing in the traditional E – Migrants databases. Conclusion: The proposed methodology facilitate migrants to generate dynamic knowledge and visualize it in the form of dynamic cubes. Applying Business Intelligence mechanisms, blending it with tuned OLAP operations, the authors have managed to transform traditional datasets into intelligent migrants Data Warehouse.


Author(s):  
Xabier Rodríguez-Martínez ◽  
Enrique Pascual-San-José ◽  
Mariano Campoy-Quiles

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.


Mathematics ◽  
2020 ◽  
Vol 8 (8) ◽  
pp. 1303 ◽  
Author(s):  
Carl Leake ◽  
Hunter Johnston ◽  
Daniele Mortari

This article presents a reformulation of the Theory of Functional Connections: a general methodology for functional interpolation that can embed a set of user-specified linear constraints. The reformulation presented in this paper exploits the underlying functional structure presented in the seminal paper on the Theory of Functional Connections to ease the derivation of these interpolating functionals—called constrained expressions—and provides rigorous terminology that lends itself to straightforward derivations of mathematical proofs regarding the properties of these constrained expressions. Furthermore, the extension of the technique to and proofs in n-dimensions is immediate through a recursive application of the univariate formulation. In all, the results of this reformulation are compared to prior work to highlight the novelty and mathematical convenience of using this approach. Finally, the methodology presented in this paper is applied to two partial differential equations with different boundary conditions, and, when data is available, the results are compared to state-of-the-art methods.


Author(s):  
Marcus Paradies ◽  
Stefan Plantikow ◽  
Oskar van Rest

Sign in / Sign up

Export Citation Format

Share Document