scholarly journals Development and Evaluation of a Big Data Framework for Performance Management in Mobile Networks

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 226380-226396
Author(s):  
Diana Martinez-Mosquera ◽  
Rosa Navarrete ◽  
Sergio Lujan-Mora
2021 ◽  
Vol 7 ◽  
pp. e652
Author(s):  
Diana Martinez-Mosquera ◽  
Rosa Navarrete ◽  
Sergio Luján-Mora

The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark. For these reasons, multiple studies have proposed new techniques and evaluated the processing of XML files with Big Data systems. However, a more usual approach in such works involves the simplest XML schemas, even though, real data sets are composed of complex schemas. Therefore, to shed light on complex XML schema processing for real-life applications with Big Data tools, we present an approach that combines three techniques. This comprises three main methods for parsing XML files: cataloging, deserialization, and positional explode. For cataloging, the elements of the XML schema are mapped into root, arrays, structures, values, and attributes. Based on these elements, the deserialization and positional explode are straightforwardly implemented. To demonstrate the validity of our proposal, we develop a case study by implementing a test environment to illustrate the methods using real data sets provided from performance management of two mobile network vendors. Our main results state the validity of the proposed method for different versions of Apache Hive and Apache Spark, obtain the query execution times for Apache Hive internal and external tables and Apache Spark data frames, and compare the query performance in Apache Hive with that of Apache Spark. Another contribution made is a case study in which a novel solution is proposed for data analysis in the performance management systems of mobile networks.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 71132-71142
Author(s):  
Gerard Mor ◽  
Jordi Vilaplana ◽  
Stoyan Danov ◽  
Jordi Cipriano ◽  
Francesc Solsona ◽  
...  

The term “Big data” refers to “the high volume of data sets that are relatively complex in nature and having challenges in processing and analyzing the data using conventional database management tools”. In the digital universe, the data volume and variety that, we deal today have grown-up massively from different sources such as Business Informatics, Social-Media Networks, Images from High Definition TV, data from Mobile Networks, Banking data from ATM Machines, Genomics and GPS Trails, Telemetry from automobiles, Meteorology, Financial market data etc. Data Scientists confirm that 80% of the data that we have gathered today are in unstructured format, i.e. in the form of images, pixel data, Videos, geo-spatial data, PDF files etc. Because of the massive growth of data and its different formats, organizations are having multiple challenges in capturing, storing, mining, analyzing, and visualizing the Big data. This paper aims to exemplify the key challenges faced by most organizations and the significance of implementing the emerging Big data techniques for effective extraction of business intelligence to make better and faster decisions


Author(s):  
J. Boehm ◽  
K. Liu ◽  
C. Alis

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.


Sign in / Sign up

Export Citation Format

Share Document