scholarly journals A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

2020 ◽  
Vol 10 (23) ◽  
pp. 8674
Author(s):  
Roberto Tardío ◽  
Alejandro Maté ◽  
Juan Trujillo

In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.

Author(s):  
Panos Constantinides

This paper explores the strategic importance of information systems for managing such crises as the H1N1 outbreak and the Haiti earthquake in the healthcare service chain. The paper synthesizes the literature on crisis management and information systems for emergency response and draws some key lessons for healthcare service chains. The paper illustrates these lessons by using data from an empirical case study in the region of Crete in Greece. The author concludes by discussing some future directions in managing crises in the healthcare service chain, including the importance of distributive, adaptive crisis management through new technologies like mashups.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Tawfiq Hasanin ◽  
Taghi M. Khoshgoftaar ◽  
Joffrey L. Leevy ◽  
Richard A. Bauder

AbstractSevere class imbalance between majority and minority classes in Big Data can bias the predictive performance of Machine Learning algorithms toward the majority (negative) class. Where the minority (positive) class holds greater value than the majority (negative) class and the occurrence of false negatives incurs a greater penalty than false positives, the bias may lead to adverse consequences. Our paper incorporates two case studies, each utilizing three learners, six sampling approaches, two performance metrics, and five sampled distribution ratios, to uniquely investigate the effect of severe class imbalance on Big Data analytics. The learners (Gradient-Boosted Trees, Logistic Regression, Random Forest) were implemented within the Apache Spark framework. The first case study is based on a Medicare fraud detection dataset. The second case study, unlike the first, includes training data from one source (SlowlorisBig Dataset) and test data from a separate source (POST dataset). Results from the Medicare case study are not conclusive regarding the best sampling approach using Area Under the Receiver Operating Characteristic Curve and Geometric Mean performance metrics. However, it should be noted that the Random Undersampling approach performs adequately in the first case study. For the SlowlorisBig case study, Random Undersampling convincingly outperforms the other five sampling approaches (Random Oversampling, Synthetic Minority Over-sampling TEchnique, SMOTE-borderline1 , SMOTE-borderline2 , ADAptive SYNthetic) when measuring performance with Area Under the Receiver Operating Characteristic Curve and Geometric Mean metrics. Based on its classification performance in both case studies, Random Undersampling is the best choice as it results in models with a significantly smaller number of samples, thus reducing computational burden and training time.


Author(s):  
Amine Rahmani ◽  
Abdelmalek Amine ◽  
Reda Mohamed Hamou

In the last years, with the emergence of new technologies in the image of big data, the privacy concerns had grown widely. However, big data means the dematerialization of the data. The classical security solutions are no longer efficient in this case. Nowadays, sharing the data is much easier as well as saying hello. The amount of shared data over the web keeps growing from day to another which creates a wide gap between the purpose of sharing data and the fact that these last contain sensitive information. For that, the researches turned their attention to new issues and domains in order to minimize this gap. In other way, they intended to ensure a good utility of data by preserving its meaning while hiding sensitive information to prevent identity disclosure. Many techniques had been used for that. Some of it is mathematical and other ones using data mining algorithms. This paper deals with the problem of hiding sensitive data in shared structured medical data using a new bio-inspired algorithm from the natural phenomena of apoptosis cells in human body.


Author(s):  
Shigemi Kagawa ◽  
Daisuke Nishijima ◽  
Yuya Nakamoto

In order to achieve climate change mitigation goals, reducing greenhouse gas (GHG) emissions from Japan’s household sector is critical. Accomplishing a transition to low carbon and energy efficient consumer goods is particularly valuable as a policy tool for reducing emissions in the residential sector. This case study presents an analysis of the lifetime of personal vehicles in Japan, and considers the optimal scenario in terms of retention and disposal, specifically as it relates to GHG emissions. Using data from Japan, the case study shows the critical importance of including whole-of-life energy and carbon calculations when assessing the contributions that new technologies can make towards low carbon mobility transitions. While energy-efficiency gains are important, replacing technologies can overlook the energy and carbon embedded in the production phase. Without this perspective, policy designed to reduce GHG emissions may result in increased emissions and further exacerbate global climate change.


2019 ◽  
Vol 12 (2) ◽  
pp. 131-156 ◽  
Author(s):  
Päivikki Kuoppakangas ◽  
Tony Kinder ◽  
Jari Stenvall ◽  
Ilpo Laitinen ◽  
Olli-Pekka Ruuskanen ◽  
...  

AbstractThis study examines public organisations planning big data-driven transformations in their service provision. Without radical structural change or managerial system changes, leaders face dilemmas: simply bolting on big data makes little difference. This study is based on a qualitative empirical case study using data collected from the cities of Helsinki and Tampere in Finland. The three core dilemma pairs detected and connected to the big data-related organisational changes are: (1) repetitive continuity vs. visionary change, (2) risk-taking vs. security-seeking and (3) technology-based development vs. human-based development. This study suggests that organisational readiness involves not only capabilities; instead, readiness involves absorbing knowledge, making decisions, handling ambiguities, managing dilemmas. Thus, big data-related transformations in public organisations require embracing the world of dilemmas, since selected and cancelled experiments may each have valuable outcomes. The capability to act on intentions is a prerequisite for readiness; however, a preparedness to detect and address dilemmas is central to big data-related transformations. Thus, the ability to make dilemma decisions is a more complicated characteristic of readiness. In conclusion, our data analysis suggests that traditional public organisational and chance management approaches produce unsolved dilemmas in big data-related organisational changes.


Author(s):  
Panos Constantinides

This paper explores the strategic importance of information systems for managing such crises as the H1N1 outbreak and the Haiti earthquake in the healthcare service chain. The paper synthesizes the literature on crisis management and information systems for emergency response and draws some key lessons for healthcare service chains. The paper illustrates these lessons by using data from an empirical case study in the region of Crete in Greece. The author concludes by discussing some future directions in managing crises in the healthcare service chain, including the importance of distributive, adaptive crisis management through new technologies like mashups.


2021 ◽  
Author(s):  
Christina Borowiec

Usage of big data with before-after methods of analysis makes it possible to evaluate the effect of major transport investments on system performance. In employing before-after methods to investigate the impact of lane closures on congestion and travel reliability, changes and trade-offs in performance indicators are quantified and policy action effectiveness is evaluated. This is illustrated through a case study of two separate lane closure interventions on the Gardiner Expressway in Toronto, Ontario. Models using a regression framework were developed for the pre-, peri-, and post-closure test periods of the first intervention and pre- and peri-closure periods of the second intervention. Results suggest the impacts of policy actions on system performance are strong, and that congestion and travel reliability counterintuitively move in different directions. Reduced demand effects are observed, prompting discussion on how highways and congestion should be managed and whether or not municipalities should add capacity to regional assets.


2021 ◽  
Author(s):  
Christina Borowiec

Usage of big data with before-after methods of analysis makes it possible to evaluate the effect of major transport investments on system performance. In employing before-after methods to investigate the impact of lane closures on congestion and travel reliability, changes and trade-offs in performance indicators are quantified and policy action effectiveness is evaluated. This is illustrated through a case study of two separate lane closure interventions on the Gardiner Expressway in Toronto, Ontario. Models using a regression framework were developed for the pre-, peri-, and post-closure test periods of the first intervention and pre- and peri-closure periods of the second intervention. Results suggest the impacts of policy actions on system performance are strong, and that congestion and travel reliability counterintuitively move in different directions. Reduced demand effects are observed, prompting discussion on how highways and congestion should be managed and whether or not municipalities should add capacity to regional assets.


Author(s):  
Jorge Lima de Magalhães ◽  
Juliana Satie Oliveira Igarashi ◽  
Zulmira Hartz ◽  
Adelaide Maria de Souza Antunes ◽  
Elizabeth Valverde Macedo

The informational and digital era of Big Data presents a non-trivial and unprecedented way in history for data and information management in organizations. Thus, to manage, protect, and ensure the validation of this data, it is imperative to develop new technologies for project management and their respective implementation in organizations. This chapter shows a case study in a pharmaceutical industry with the proposition of a methodology for validation of emerging technologies in the computerized systems. Data validation and security for project management in the organization is increasingly in demand. So, this implies that time and human resources in organizations are not infinite. It is necessary to prioritize the activities and resources dedicated to maintaining the validated state of the system. Authors propose a risk analysis to help companies with validation. They also present a proposed methodology for risk analysis from the point of view of the validation of computerized systems in a Warehouse Management module in a validated SAP ERP.


Sign in / Sign up

Export Citation Format

Share Document