Credit scoring using system log data in the internet bank

2021 ◽  
Author(s):  
Sunghyon Kyeong ◽  
Daehee Kim ◽  
Jinho Shin
Author(s):  
Sunghyun Kyeong ◽  
Daehee Kim ◽  
Jinho Shin

This study is the first to examine whether the performance of credit rating, one of the most important data-based decision-making of banks, can be improved by using banking system log data that is extensively accumulated inside the bank for system operation. This study uses the log data recorded for the mobile app system of Kakaobank, a leading internet bank used by more than 14 million people in Korea. After generating candidate variables from Kakaobank's vast log data, we develop a credit scoring model by utilizing variables with high information values. Consequently, the discrimination power of the new model compared to the credit bureau grades was significantly improved by 1.84% points based on the Kolmogorov–Smirnov statistics. Therefore, the results of this study imply that if a bank utilizes its log data that have already been extensively accumulated inside the bank, decision-making systems, including credit scoring, can be efficiently improved at a low cost.


2021 ◽  
Vol 14 (1) ◽  
pp. 130
Author(s):  
Sunghyon Kyeong ◽  
Daehee Kim ◽  
Jinho Shin

The credit scoring model is one of the most important decision-making tools for the sustainability of banking systems. This study is the first to examine whether it can be improved by using system log data that are stoed extensively for system operation. We used the log data recorded by the mobile application system of KakaoBank, a leading internet bank used by more than 14 million people in Korea. After generating candidate variables from KakaoBank’s log data, we created a credit scoring model by utilizing variables with high information values and logistic regression, the most common method for developing credit scoring models in financial institutions. To prove our hypothesis on the improvement of credit scoring model performance, we performed an independent sample t-test using the simulation results of repeated model development and performance measurement based on randomly sampled data. Consequently, the discrimination power of the proposed model using logistic regression (neural network) compared to the credit bureau-based model significantly improved by 1.84 (2.22) percentage points based on the Kolmogorov–Smirnov statistics. The results of this study suggest that a bank can utilize the accumulated log data inside the bank to improve decision-making systems, including credit scoring, at a low cost.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 232 ◽  
Author(s):  
Yitong Ren ◽  
Zhaojun Gu ◽  
Zhi Wang ◽  
Zhihong Tian ◽  
Chunbo Liu ◽  
...  

With the rapid development of the Internet of Things, the combination of the Internet of Things with machine learning, Hadoop and other fields are current development trends. Hadoop Distributed File System (HDFS) is one of the core components of Hadoop, which is used to process files that are divided into data blocks distributed in the cluster. Once the distributed log data are abnormal, it will cause serious losses. When using machine learning algorithms for system log anomaly detection, the output of threshold-based classification models are only normal or abnormal simple predictions. This paper used the statistical learning method of conformity measure to calculate the similarity between test data and past experience. Compared with detection methods based on static threshold, the statistical learning method of the conformity measure can dynamically adapt to the changing log data. By adjusting the maximum fault tolerance, a system administrator can better manage and monitor the system logs. In addition, the computational efficiency of the statistical learning method for conformity measurement was improved. This paper implemented an intranet anomaly detection model based on log analysis, and conducted trial detection on HDFS data sets quickly and efficiently.


Author(s):  
Pablo A. Muñoz-Gallego ◽  
Pedro Pinheiro Cruz

2019 ◽  
Vol 1 (2) ◽  
pp. 143-153
Author(s):  
Thifal Baraas ◽  
Akbar Juliansyah ◽  
Ahmad Ashril Rizal

Abstrak Browsing atau kegiatan menjelajahi internet menjadi salah satu aktivitas yang sering dilakukan pada zaman kini. Baik anak-anak hingga orang dewasa menjadi pengguna internet. Akan tetapi para pengguna internet tidak mengetahui jika internet juga bisa menjadi ancaman terutama adanya serangan-serangan yang menyerang sistem keamanan jaringan. Untuk mendeteksi adanya aktivitas yang mencurigakan yang melalui jaringan dibutuhkan bantuan dari IDS (Intrusion Detection Sistem). Ketika terjadi banyak serangan yang masuk, IDS tidak bisa menanganinya secara akurat, hal ini mengakibatkan aktivitas normal di dalam jaringan bisa dianggap sebagai serangan dari hacker atau sebaliknya. Data mining adalah prses yang digunakan untuk menemukan hubungan dari data-data untuk mendapatkan sebuah kesimpulan dari data tersebut. Algoritma C4.5 merupakan salah satu algoritma yang digunakan untuk membuat pohon keputusan. Metode pohon keputusan mengubah fakta yang sangat besar menjadi pohon keputusan yang merepresentasikan aturan. Aturan dapat dengan mudah dipahami dengan bahasa alami. Dengan mengklasifikasi data log IDS dengan algoritma C4.5 dapat mengurangi terjadinya kesalahan IDS dalam menentukan aktivitas yang termasuk serangan atau bukan. Hasil penelitian menunjukkan data log IDS dapat diklasifikasikan dengan algoritma C4.5 dengan tingkat akurasi model adalah 96.371% yang membuktikan bahwa model ini dapat digunakan dalam menentukan aktivitas yang termasuk serangan atau bukan. Abstract Browsing or surfing the internet is one of the activities that are often done today. Both children and adults become internet users. However, internet users do not know the internet can also be a threat, especially the attacks that attack the network security system. To detect suspicious activity through the network, assistance from IDS (Intrusion Detection System) is needed. When there are many incoming attacks, IDS cannot handle it accurately, this results in normal activities on the network can be considered as an attack from hackers or vice versa. Data mining is a process used to find relationships from data to get a conclusion from that data. C4.5 algorithm is one algorithm used to make a decision tree. The decision tree method converts very large facts into decision trees that represent rules. Rules can be easily understood with natural language. By classifying the IDS log data with the C4.5 algorithm it can reduce the occurrence of IDS errors in determining which activities are included or not. The results showed the IDS log data can be classified with the C4.5 algorithm with a 96.371% accuracy rate of the model which proves that this model can be used in determining activities that are included as attacks or not.


2021 ◽  
Vol 6 (1) ◽  
pp. 59
Author(s):  
Edi Priyanto ◽  
Arief Hermawan ◽  
Rianto Rianto ◽  
Donny Avianto

As the usage of the internet grows, more and more information is obtained, thus presenting challenges, especially for users and website owners. Website users often have difficulty finding products or services that are relevant to their needs caused by abundant amounts of products and services delivered on a website. Website owners often find it difficult to convey information about the right products and services to certain target users. Based on the problem given above, we can conclude that a recommendation system approach that can improve personalization on their website is needed. The recommendation system approach must be able to provide navigation on the website to make it more adaptive towards the interests and information needed by the user. This study uses Association Rules formed from Microsoft web access log data by finding visitor patterns based on frequently visited web site pages. From the results of the research conducted, the performance of the method used has a precision value of 0.896, 0.058 recall, and F-measure 0.104. Whereas the measurement of the accuracy value resulted in a performance recommendation of exactly 3%, an acceptable rate of 87%, and 10% incorrect. This research shows that the Association Rules method can increase the effectiveness of website personalization to provide relevant information recommendations for visitors. For further research, it can concentrate on improving existing methods thus website personalization becomes more adaptive.


Author(s):  
A. V. Butov

Transformation of organizations is a regular and continuous process proving the development of all spheres of economy and society. The article deals with the theory and practice of establishing organizations of the new type, which can reveal the potential of every employee. The author investigated the evolution of organization development for the last 100 years, showed the foundations of the theory of turquoise organizations by F. Lalu and described three discoveries of this theory and the essential characteristics of the turquoise organizations, which could crucially change the current system of management and provide a real breakthrough in raising its efficiency. Special attention is paid to specificity of introducing the model of turquoise organization in Russian and overseas companies. Key elements of the organizational model of well-known foreign turquoise companies, such as Buurtzorg, Sun Hydraulics, FAVI, working in different fields, were given in the article. Competitive advantages of these companies working by turquoise principles of management and showing fast growth in key lines of activity were demonstrated. The author studied the experience of establishing turquoise organizations in Russia (the VkusVill company, the Saving bank, the Modulbank and the internet-bank Tochka) and identified the basic turquoise principles introduced in management of these organizations, their results and prospects of development and analyzed problems connected with limited use of the turquoise model. The article provides concrete recommendations aimed at introduction of turquoise principles in practice of Russian companies’ work.


Every user of the internet has high aspirations on its reliability, efficiency, productivity and in many other aspects of the same. Providing an uninterrupted service is of prime importance .The amount of data along with enormous number of residual traces is increasing rapidly and significantly. As a result, analysis of log data has profoundly influenced many aspects of researcher’s domains. Social media being integral part of the Internet, real time blogging services like Twitter are widely used due to their inherent nature of depicting social graph, propagating information and entire social dynamics. Content of tweets are of major interest to researchers as they reflect individuals experiences, real time events. Researchers have explored several applications of tweet analysis. One such application is detecting service outages through a myriad of messages posted by users regarding unavailability. Simple techniques are enough to extract key semantics from tweets as they are faster alerts for warning about service unavailability. Similarly, the outage mailing lists are text-based messages which are rich in semantic information about the underlying outages. Researchers find it a great challenge to automatically parse and process the data through NLP and text mining for service outage detection. An extensive study was conducted, aiming to explore the research directions and opportunities on log analysis, tweet analysis and outage mailing list analysis for the purpose of detecting and predicting service outages. A systematic- frame work is also articulated with a focus on all stages of analytics and we deliberately discussed potential research challenges & paths in the above said analysis. We introduce three major data analysis methods for diagnosing the causes of service failures , detecting service failures prematurely and predicting them. We analyze Syslogs (contain log data generated by the system) for detecting the cause of a failure by automatically learning over millions of logs and analyze the data of a social networking service (namely, Twitter and outage mails) to detect possible service failures by extracting failure related tweets, which account for less than a percent of all tweet in real time with high accuracy. Paper is an effort not only to detect outages but also to forecast them using twitter analysis based on time series and neural network models. We further propose a log analysis model for the same.


Sign in / Sign up

Export Citation Format

Share Document