Large-Scale Data-Driven Financial Risk Modeling Using Big Data Technology

Large-Scale Data-Driven Financial Risk Assessment

Applied Data Science ◽

10.1007/978-3-030-11821-1_21 ◽

2019 ◽

pp. 387-408 ◽

Cited By ~ 1

Author(s):

Wolfgang Breymann ◽

Nils Bundi ◽

Jonas Heitz ◽

Johannes Micheler ◽

Kurt Stockinger

Keyword(s):

Risk Assessment ◽

Financial Risk ◽

Large Scale ◽

Data Driven ◽

Large Scale Data ◽

Scale Data

Download Full-text

Analyzing Bangkok city taxi ride: reforming fares for profit sustainability using big data driven model

Journal Of Big Data ◽

10.1186/s40537-020-00396-5 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Thananut Phiboonbanakit ◽

Teerayut Horanont

Keyword(s):

Big Data ◽

Traffic Congestion ◽

Large Scale ◽

Data Driven ◽

Taxi Driver ◽

Long Distance ◽

Taxi Drivers ◽

Advantages And Disadvantages ◽

Large Scale Data ◽

Scale Data

AbstractWith the trend toward the use of large-scale vehicle probe data, an urban-scale analysis can now provide useful information for taxi drivers and passengers. Unfortunately, traffic congestion has become a critical problem in urban cities. Road traffic congestion reduces productivity in transportation services, and the daily profit earned is consequently reduced. This is opposite to the cost of living, which is increasing rapidly. Therefore, these issues are causing difficulties in all occupations in terms of managing daily expenses, particularly for taxi drivers. The taxi driving is classified as low income compared to other occupations. Such facts are a symbol of economic inefficiency. To this end, this study aims to assist taxi agencies and the government in improving taxi driver profits in Bangkok using large-scale data. To deal with these large-scale data, we propose a big data-driven model. With this model, we first calculate costs using a cost–distance algorithm and trip reconstruction. The data are then modeled to understand distance-based profits with respect to the departure time and traffic conditions. Finally, several cost predictive models using machine learning are evaluated using the ground truth from 50 taxis for a 1-month period. The experiment results show that more frequent trips over a short distance yield higher profits than long-distance trips. Finally, a solution to improve taxi driver profits is determined. We also compare the advantages and disadvantages of a unified solution.

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

Data Lake Ecosystem Workflow

10.21079/11681/40203 ◽

2021 ◽

Author(s):

R. Salter ◽

Quyen Dong ◽

Cody Coleman ◽

Maria Seale ◽

Alicia Ruvinsky ◽

...

Keyword(s):

Big Data ◽

Language Processing ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Lake Ecosystem ◽

Data Governance ◽

Government Organizations ◽

Large Scale Data ◽

Scale Data

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Web Services ◽

10.4018/978-1-5225-7501-6.ch052 ◽

2019 ◽

pp. 953-978

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

New Frontiers for E-Learning in Education

Optimizing Student Engagement in Online Learning Environments - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-5225-3634-5.ch010 ◽

2018 ◽

pp. 220-240

Author(s):

Mohammad Zubair Khan ◽

Yasser M. Alginahi

Keyword(s):

Big Data ◽

Large Scale ◽

Data Repositories ◽

Useful Knowledge ◽

Leading Role ◽

Large Scale Data ◽

E Learning ◽

Base Management ◽

Wide Group ◽

Scale Data

Big Data research is playing a leading role in investigating a wide group of issues fundamentally emerging concerning Database, Data Warehousing, and Data Mining research. Analytics research is intended to develop complex procedures running over large-scale data repositories with the objective of extracting useful knowledge hidden in such repositories. A standout amongst the most noteworthy application situations where Big Data emerge is, without uncertainty, logical figuring. Here, researchers and analysts create immense measures of information everyday by means of investigations (e.g., disciplines like high vitality material science, space science, bioinformatics, etc.). Nevertheless, separating helpful learning for basic leadership purposes from these enormous, vast scale data repositories are practically inconceivable for genuine Data Base Management Systems (DBMS), is inspired investigation tools.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch002 ◽

2017 ◽

pp. 14-40 ◽

Cited By ~ 2

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

Scheduling issue for Dynamic Load Balancing of map-reduce in large scale data (Big data)

Journal of Xidian University ◽

10.37896/jxu14.5/445 ◽

2020 ◽

Vol 14 (5) ◽

Keyword(s):

Big Data ◽

Load Balancing ◽

Dynamic Load ◽

Large Scale ◽

Dynamic Load Balancing ◽

Map Reduce ◽

Large Scale Data ◽

Scale Data

Download Full-text

On the Effectiveness of Hybrid Canopy with Hoeffding Adaptive Naive Bayes Trees

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2017040102 ◽

2017 ◽

Vol 8 (2) ◽

pp. 30-43

Author(s):

Mrutyunjaya Panda

Keyword(s):

Big Data ◽

Clustering Analysis ◽

Large Scale ◽

Data Sets ◽

Recent Past ◽

Large Scale Data ◽

Huge Data ◽

With Memory ◽

Memory Constraints ◽

Scale Data

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.

Download Full-text

Compounding as Abstract Operation in Semantic Space: Investigating relational effects through a large-scale, data-driven computational model

Cognition ◽

10.1016/j.cognition.2017.05.026 ◽

2017 ◽

Vol 166 ◽

pp. 207-224 ◽

Cited By ~ 10

Author(s):

Marco Marelli ◽

Christina L. Gagné ◽

Thomas L. Spalding

Keyword(s):

Computational Model ◽

Large Scale ◽

Semantic Space ◽

Data Driven ◽

Large Scale Data ◽

Abstract Operation ◽

Scale Data

Download Full-text