Typhoon Quantitative Rainfall Prediction from Big Data Analytics by Using the Apache Hadoop Spark Parallel Computing Framework

Chih-Chiang Wei; Tzu-Hao Chou

doi:10.3390/atmos11080870

Typhoon Quantitative Rainfall Prediction from Big Data Analytics by Using the Apache Hadoop Spark Parallel Computing Framework

Atmosphere ◽

10.3390/atmos11080870 ◽

2020 ◽

Vol 11 (8) ◽

pp. 870 ◽

Cited By ~ 1

Author(s):

Chih-Chiang Wei ◽

Tzu-Hao Chou

Keyword(s):

Machine Learning ◽

Big Data ◽

Prediction Models ◽

Processing Unit ◽

Central Processing ◽

Rainfall Prediction ◽

Typhoon Rainfall ◽

Computing Framework ◽

Spark Framework ◽

Big Data Technology

Situated in the main tracks of typhoons in the Northwestern Pacific Ocean, Taiwan frequently encounters disasters from heavy rainfall during typhoons. Accurate and timely typhoon rainfall prediction is an imperative topic that must be addressed. The purpose of this study was to develop a Hadoop Spark distribute framework based on big-data technology, to accelerate the computation of typhoon rainfall prediction models. This study used deep neural networks (DNNs) and multiple linear regressions (MLRs) in machine learning, to establish rainfall prediction models and evaluate rainfall prediction accuracy. The Hadoop Spark distributed cluster-computing framework was the big-data technology used. The Hadoop Spark framework consisted of the Hadoop Distributed File System, MapReduce framework, and Spark, which was used as a new-generation technology to improve the efficiency of the distributed computing. The research area was Northern Taiwan, which contains four surface observation stations as the experimental sites. This study collected 271 typhoon events (from 1961 to 2017). The following results were obtained: (1) in machine-learning computation, prediction errors increased with prediction duration in the DNN and MLR models; and (2) the system of Hadoop Spark framework was faster than the standalone systems (single I7 central processing unit (CPU) and single E3 CPU). When complex computation is required in a model (e.g., DNN model parameter calibration), the big-data-based Hadoop Spark framework can be used to establish highly efficient computation environments. In summary, this study successfully used the big-data Hadoop Spark framework with machine learning, to develop rainfall prediction models with effectively improved computing efficiency. Therefore, the proposed system can solve problems regarding real-time typhoon rainfall prediction with high timeliness and accuracy.

Download Full-text

Development of Heavy Rain Damage Prediction Model Using Machine Learning Based on Big Data

Advances in Meteorology ◽

10.1155/2018/5024930 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 12

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jongsung Kim ◽

Donghyun Kim ◽

Younghye Bae ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Prediction Model ◽

Prediction Models ◽

Meteorological Data ◽

Heavy Rain ◽

Machine Learning Techniques ◽

Damage Prediction ◽

Explanatory Variables ◽

The Republic

Prediction models of heavy rain damage using machine learning based on big data were developed for the Seoul Capital Area in the Republic of Korea. We used data on the occurrence of heavy rain damage from 1994 to 2015 as dependent variables and weather big data as explanatory variables. The model was developed by applying machine learning techniques such as decision trees, bagging, random forests, and boosting. As a result of evaluating the prediction performance of each model, the AUC value of the boosting model using meteorological data from the past 1 to 4 days was the highest at 95.87% and was selected as the final model. By using the prediction model developed in this study to predict the occurrence of heavy rain damage for each administrative region, we can greatly reduce the damage through proactive disaster management.

Download Full-text

Tweets Analysis with Big Data Technology and Machine Learning to Evaluate Smart and Sustainable Urban Mobility Actions in Barcelona

Complex, Intelligent and Software Intensive Systems - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-50454-0_53 ◽

2020 ◽

pp. 510-519

Author(s):

Beniamino Di Martino ◽

Luigi Colucci Cante ◽

Mariangela Graziano ◽

Regina Enrich Sard

Keyword(s):

Machine Learning ◽

Big Data ◽

Urban Mobility ◽

Sustainable Urban Mobility ◽

Big Data Technology

Download Full-text

Cloud Computing Model for Big Geological Data Processing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.475-476.306 ◽

2013 ◽

Vol 475-476 ◽

pp. 306-311 ◽

Cited By ~ 2

Author(s):

Miao Miao Song ◽

Zhe Li ◽

Bin Zhou ◽

Chao Ling Li

Keyword(s):

Cloud Computing ◽

Big Data ◽

Parallel Processing ◽

Data Processing ◽

Processing Unit ◽

Geological Data ◽

Computing Model ◽

Operation Speed ◽

Graphics Processing ◽

Big Data Technology

Geological data with phyletic and various, huge and complex data format, the analysis of geological data processing is mainly divided into three parts: Mines forecast, mine evaluation and mine positioning. Traditional geological data analysis model is limited by limited storage space and computational efficiency, and cannot meet the needs of a large number of geological data fast operations. "Big data technology" provides the ideal solution to the vast amounts of geological data management, information extraction, and comprehensive analysis. For mass storage capacity and high-speed computing power that the "big data technology" need, we built an intelligence systems applied to the analysis of geological data based on MapReduce and GPU double parallel processing cloud computing model. For a large number of geological data, using hadoop cluster system to solve the problem of large amounts of data storage, and designing efficient parallel processing method based on GPU (Graphics Processing Units: calculation of Graphics Processing unit), the method was applied to MapReduce framework, finally completing MapReduce and GPU double parallel processing cloud computing model to improve the operation speed of the system. Through theoretical modeling and experimental verification, indicating that the system can meet the analysis of geological data operation precision, the operation data amount and the operation speed.

Download Full-text

Improving Tourist Arrival Prediction: A Big Data and Artificial Neural Network Approach

Journal of Travel Research ◽

10.1177/0047287520921244 ◽

2020 ◽

pp. 004728752092124 ◽

Cited By ~ 2

Author(s):

Wolfram Höpken ◽

Tobias Eberle ◽

Matthias Fuchs ◽

Maria Lexhagen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Big Data ◽

Web Search ◽

Prediction Models ◽

Arima Models ◽

Study Results ◽

Artificial Neural ◽

Prediction Approach

Because of high fluctuations of tourism demand, accurate predictions of tourist arrivals are of high importance for tourism organizations. The study at hand presents an approach to enhance autoregressive prediction models by including travelers’ web search traffic as external input attribute for tourist arrival prediction. The study proposes a novel method to identify relevant search terms and to aggregate them into a compound web-search index, used as additional input of an autoregressive prediction approach. As methods to predict tourism arrivals, the study compares autoregressive integrated moving average (ARIMA) models with the machine learning–based technique artificial neural network (ANN). Study results show that (1) Google Trends data, mirroring traveler’s online search behavior (i.e., big data information source), significantly increase the performance of tourist arrival prediction compared to autoregressive approaches using past arrivals alone, and (2) the machine learning technique ANN has the capacity to outperform ARIMA models.

Download Full-text

Bi-LSTM Sentiment Classifier for Climate Change Issues in South Korea

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1056.0782s619 ◽

2019 ◽

Vol 8 (2S6) ◽

pp. 295-299

Keyword(s):

Climate Change ◽

Machine Learning ◽

Big Data ◽

South Korea ◽

Sentiment Analysis ◽

Training Data ◽

Learning Models ◽

Wide Range ◽

Machine Learning Models ◽

Big Data Technology

A sentiment analysis using SNS data can confirm various people’s thoughts. Thus an analysis using SNS can predict social problems and more accurately identify the complex causes of the problem. In addition, big data technology can identify SNS information that is generated in real time, allowing a wide range of people’s opinions to be understood without losing time. It can supplement traditional opinion surveys. The incumbent government mainly uses SNS to promote its policies. However, measures are needed to actively reflect SNS in the process of carrying out the policy. Therefore this paper developed a sentiment classifier that can identify public feelings on SNS about climate change. To that end, based on a dictionary formulated on the theme of climate change, we collected climate change SNS data for learning and tagged seven sentiments. Using training data, the sentiment classifier models were developed using machine learning models. The analysis showed that the Bi-LSTM model had the best performance than shallow models. It showed the highest accuracy (85.10%) in the seven sentiments classified, outperforming traditional machine learning (Naive Bayes and SVM) by approximately 34.53%p, and 7.14%p respectively. These findings substantiate the applicability of the proposed Bi-LSTM-based sentiment classifier to the analysis of sentiments relevant to diverse climate change issues.

Download Full-text

Big Data on Machine Learning – A Review

Engineering and Scientific International Journal ◽

10.30726/esij/v8.i3.2021.83018 ◽

2021 ◽

Vol 8 (3) ◽

Author(s):

Balasree K ◽

Dharmarajan K

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Data Analytics ◽

Rapid Development ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Data Sets ◽

Big Data Technology

In rapid development of Big Data technology over the recent years, this paper discussing about the Machine Learning (ML) playing role that is based on methods and algorithms to Big Data Processing and Big Data Analytics. In evolutionary fields and computing fields of developments that both are complementing each other. Big Data: The rapid growth of such data solutions needed to be studied and provided to handle then to gain the knowledge from datasets and extracting values due to the data sets are very high in velocity and variety. The Big data analytics are involving and indicating the appropriate data storage and computational outline that enhanced by using Scalable Machine Learning Algorithms and Big Data Analytics then the analytics to reveal the massive amounts of hidden data’s and secret correlations. This type of Analytic information useful for organizations and companies to gain deeper knowledge, development and getting advantages over the competition. When using this Analytics we can predict the accurate implementation over the data. This paper presented about the detailed review of state-of-the-art developments and overview of advantages and challenges in Machine Learning Algorithms over big data analytics.

Download Full-text

Machine Learning Enabled Adaptive Optimization of a Transonic Compressor Rotor With Precompression

Journal of Turbomachinery ◽

10.1115/1.4041808 ◽

2019 ◽

Vol 141 (5) ◽

Cited By ~ 3

Author(s):

Michael Joly ◽

Soumalya Sarkar ◽

Dhagash Mehta

Keyword(s):

Machine Learning ◽

Design Space Exploration ◽

Surrogate Models ◽

Processing Unit ◽

Adaptive Optimization ◽

Transonic Compressor ◽

Central Processing ◽

Compressor Rotor ◽

Self Tuning ◽

The Stability

In aerodynamic design, accurate and robust surrogate models are important to accelerate computationally expensive computational fluid dynamics (CFD)-based optimization. In this paper, a machine learning framework is presented to speed-up the design optimization of a highly loaded transonic compressor rotor. The approach is threefold: (1) dynamic selection and self-tuning among several surrogate models; (2) classification to anticipate failure of the performance evaluation; and (3) adaptive selection of new candidates to perform CFD evaluation for updating the surrogate, which facilitates design space exploration and reduces surrogate uncertainty. The framework is demonstrated with a multipoint optimization of the transonic NASA rotor 37, yielding increased compressor efficiency in less than 48 h on 100 central processing unit cores. The optimized rotor geometry features precompression that relocates and attenuates the shock, without the stability penalty or undesired reacceleration usually observed in the literature.

Download Full-text

Complex Power System Status Monitoring and Evaluation Using Big Data Platform and Machine Learning Algorithms: A Review and a Case Study

Complexity ◽

10.1155/2018/8496187 ◽

2018 ◽

Vol 2018 ◽

pp. 1-21 ◽

Cited By ~ 7

Author(s):

Yuanjun Guo ◽

Zhile Yang ◽

Shengzhong Feng ◽

Jinxing Hu

Keyword(s):

Machine Learning ◽

Big Data ◽

Smart Grid ◽

Power System ◽

Data Management ◽

Power Grid ◽

Technical Solution ◽

Data Platform ◽

Complex Power ◽

Big Data Technology

Efficient and valuable strategies provided by large amount of available data are urgently needed for a sustainable electricity system that includes smart grid technologies and very complex power system situations. Big Data technologies including Big Data management and utilization based on increasingly collected data from every component of the power grid are crucial for the successful deployment and monitoring of the system. This paper reviews the key technologies of Big Data management and intelligent machine learning methods for complex power systems. Based on a comprehensive study of power system and Big Data, several challenges are summarized to unlock the potential of Big Data technology in the application of smart grid. This paper proposed a modified and optimized structure of the Big Data processing platform according to the power data sources and different structures. Numerous open-sourced Big Data analytical tools and software are integrated as modules of the analytic engine, and self-developed advanced algorithms are also designed. The proposed framework comprises a data interface, a Big Data management, analytic engine as well as the applications, and display module. To fully investigate the proposed structure, three major applications are introduced: development of power grid topology and parallel computing using CIM files, high-efficiency load-shedding calculation, and power system transmission line tripping analysis using 3D visualization. The real-system cases demonstrate the effectiveness and great potential of the Big Data platform; therefore, data resources can achieve their full potential value for strategies and decision-making for smart grid. The proposed platform can provide a technical solution to the multidisciplinary cooperation of Big Data technology and smart grid monitoring.

Download Full-text

A MACHINE LEARNING MODEL FOR AN EARTHQUAKE FORECASTING USING PARALLEL PROCESSING

Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020) ◽

10.47412/dhhv5862 ◽

2020 ◽

Author(s):

Manoj Kollam ◽

Ajay Joshi

Keyword(s):

Machine Learning ◽

Parallel Processing ◽

Geographical Location ◽

Model Performance ◽

Economic Loss ◽

Central Processing Unit ◽

Earthquake Forecasting ◽

Processing Unit ◽

Daily Lives ◽

Central Processing

Earthquake is a devastating natural hazard which has a capability to wipe out thousands of lives and cause economic loss to the geographical location. Seismic stations continuously gather data without the necessity of the occurrence of an event. The gathered data is processed by the model to forecast the occurrence of earthquakes. This paper presents a model to forecast earthquakes using Parallel processing. Machine Learning is rapidly taking over a variety of aspects in our daily lives. Even though Machine Learning methods can be used for analyzing data, in the scenario of event forecasts like earthquakes, performance of Machine Learning is limited as the data grows day by day. Using ML alone is not a perfect solution for the model. To increase the model performance and accuracy, a new ML model is designed using parallel processing. The drawbacks of ML using central processing unit (CPU) can be overcome byGraphic Processing unit (GPU) implementation, since the parallelism is naturally provided using framework for developing GPU utilizing computational algorithms, known as the Compute Unified Device Architecture (CUDA). The implementation of hybrid state vector machine (H-SVM) algorithm using parallel processing through CUDA is used to forecast earthquakes. Our experiments show that the GPU based implementation achieved typical speedup values in the range of 3-70 times compared to conventional central processing unit (CPU). Results of different experiments are discussed along with their consequences.

Download Full-text

Privacy-Aware Data Forensics of VRUs Using Machine Learning and Big Data Analytics

Security and Communication Networks ◽

10.1155/2021/3320436 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Muhammad Babar ◽

Muhammad Usman Tariq ◽

Ahmed S. Almasoud ◽

Mohammad Dahman Alshehri

Keyword(s):

Machine Learning ◽

Big Data ◽

Traffic Control ◽

Data Analytics ◽

Data Privacy ◽

Big Data Analytics ◽

Processing Unit ◽

Privacy And Security ◽

User Data ◽

Data Ingestion

The present spreading out of big data found the realization of AI and machine learning. With the rise of big data and machine learning, the idea of improving accuracy and enhancing the efficacy of AI applications is also gaining prominence. Machine learning solutions provide improved guard safety in hazardous traffic circumstances in the context of traffic applications. The existing architectures have various challenges, where data privacy is the foremost challenge for vulnerable road users (VRUs). The key reason for failure in traffic control for pedestrians is flawed in the privacy handling of the users. The user data are at risk and are prone to several privacy and security gaps. If an invader succeeds to infiltrate the setup, exposed data can be malevolently influenced, contrived, and misrepresented for illegitimate drives. In this study, an architecture is proposed based on machine learning to analyze and process big data efficiently in a secure environment. The proposed model considers the privacy of users during big data processing. The proposed architecture is a layered framework with a parallel and distributed module using machine learning on big data to achieve secure big data analytics. The proposed architecture designs a distinct unit for privacy management using a machine learning classifier. A stream processing unit is also integrated with the architecture to process the information. The proposed system is apprehended using real-time datasets from various sources and experimentally tested with reliable datasets that disclose the effectiveness of the proposed architecture. The data ingestion results are also highlighted along with training and validation results.

Download Full-text