Machine Learning Research in Big Data Environment

Performance analysis of machine learning techniques for the prediction of breast cancer in big data environment

10.1063/5.0011116 ◽

2020 ◽

Author(s):

Premalatha Jayapaul ◽

Aswini Balasundaram ◽

Kavi Priya Dharshini Seturamalingam ◽

Kavithra Sekar

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Big Data ◽

Performance Analysis ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Data Environment

Download Full-text

The product quality risk assessment of e-commerce by machine learning algorithm on spark in big data environment

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-179305 ◽

2019 ◽

Vol 37 (4) ◽

pp. 4705-4715

Author(s):

Yi Liu ◽

Jiahuan Lu ◽

Feng Mao ◽

Kaidi Tong

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Big Data ◽

Product Quality ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Environment ◽

Quality Risk

Download Full-text

Hunting for open clusters in Gaia DR2: 582 new open clusters in the Galactic disc

Astronomy and Astrophysics ◽

10.1051/0004-6361/201937386 ◽

2020 ◽

Vol 635 ◽

pp. A45 ◽

Cited By ~ 14

Author(s):

A. Castro-Ginard ◽

C. Jordi ◽

X. Luri ◽

J. Álvarez Cid-Fuentes ◽

L. Casamiquela ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Clustering Algorithm ◽

Open Cluster ◽

Open Clusters ◽

Galactic Disc ◽

Data Environment ◽

Blind Search ◽

Tidal Tails ◽

Gaia Dr2

Context. Open clusters are key targets for studies of Galaxy structure and evolution, and stellar physics. Since the Gaia data release 2 (DR2), the discovery of undetected clusters has shown that previous surveys were incomplete. Aims. Our aim is to exploit the Big Data capabilities of machine learning to detect new open clusters in Gaia DR2, and to complete the open cluster sample to enable further studies of the Galactic disc. Methods. We use a machine-learning based methodology to systematically search the Galactic disc for overdensities in the astrometric space and identify the open clusters using photometric information. First, we used an unsupervised clustering algorithm, DBSCAN, to blindly search for these overdensities in Gaia DR2 (l, b, ϖ, μα*, μδ), and then we used a deep learning artificial neural network trained on colour–magnitude diagrams to identify isochrone patterns in these overdensities, and to confirm them as open clusters. Results. We find 582 new open clusters distributed along the Galactic disc in the region |b| < 20°. We detect substructure in complex regions, and identify the tidal tails of a disrupting cluster UBC 274 of ∼3 Gyr located at ∼2 kpc. Conclusions. Adapting the mentioned methodology to a Big Data environment allows us to target the search using the physical properties of open clusters instead of being driven by computational limitations. This blind search for open clusters in the Galactic disc increases the number of known open clusters by 45%.

Download Full-text

Security Threats and Defensive Approaches in Machine Learning System Under Big Data Environment

Wireless Personal Communications ◽

10.1007/s11277-021-08284-8 ◽

2021 ◽

Vol 117 (4) ◽

pp. 3505-3525

Author(s):

Chen Hongsong ◽

Zhang Yongpeng ◽

Cao Yongrui ◽

Bharat Bhargava

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning System ◽

Security Threats ◽

Data Environment

Download Full-text

Performance Evaluation of Machine Learning Classifiers for Stock Market Prediction in Big Data Environment

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.2019.10.00022 ◽

2019 ◽

Vol 14 (5) ◽

Author(s):

Sneh Kalra

Keyword(s):

Machine Learning ◽

Big Data ◽

Performance Evaluation ◽

Stock Market ◽

Stock Market Prediction ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Data Environment

Download Full-text

A Survey on Unsupervised K-Means Algorithm in Big Data Environment

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i330262 ◽

2021 ◽

pp. 1-8

Author(s):

Fatama Sharf Al-deen ◽

Fadl Mutaher Ba-Alwi

Keyword(s):

Machine Learning ◽

Information Technology ◽

Big Data ◽

Literature Review ◽

Relational Databases ◽

Rapid Development ◽

Learning Technologies ◽

Machine Learning Algorithms ◽

Prominent Feature ◽

Data Environment

Due to the rapid development in information technology, Big Data has become one of its prominent feature that had a great impact on other technologies dealing with data such as machine learning technologies. K-mean is one of the most important machine learning algorithms. The algorithm was first developed as a clustering technology dealing with relational databases. However, the advent of Big Data has highly effected its performance. Therefore, many researchers have proposed several approaches to improve K-mean accuracy in Big Data environment. In this paper, we introduce a literature review about different technologies proposed for k-mean algorithm development in Big Data. We demonstrate a comparison between them according to several criteria, including the proposed algorithm, the database used, Big Data tools, and k-mean applications. This paper helps researchers to see the most important challenges and trends of the k-mean algorithm in the Big Data environment.

Download Full-text

Distributed Machine Learning based Mitigating Straggler in Big Data Environment

ICC 2021 - IEEE International Conference on Communications ◽

10.1109/icc42927.2021.9500531 ◽

2021 ◽

Author(s):

Haodong Lu ◽

Kun Wang

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Environment ◽

Distributed Machine Learning

Download Full-text

A real-time big data sentiment analysis for iraqi tweets using spark streaming

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i4.1897 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1411-1419

Author(s):

Nashwan Dheyaa Zaki ◽

Nada Yousif Hashim ◽

Yasmin Makki Mohialden ◽

Mostafa Abdulghafoor Mohammed ◽

Tole Sutikno ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Sentiment Analysis ◽

Real Time ◽

Data Science ◽

Opinion Mining ◽

Data Streaming ◽

Machine Learning Applications ◽

Learning Research

The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction.

Download Full-text

Thesis on Machine Learning Methods and Its Applications

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38506 ◽

2021 ◽

Vol 9 (10) ◽

pp. 746-757

Author(s):

Dharmapriya M S

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Big Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Methods ◽

Processing Data ◽

History Of ◽

Learning Research ◽

Learning Machine

Abstract: In the 1950s, the concept of machine learning was discovered and developed as a subfield of artificial intelligence. However, there were no significant developments or research on it until this decade. Typically, this field of study has developed and expanded since the 1990s. It is a field that will continue to develop in the future due to the difficulty of analysing and processing data as the number of records and documents increases. Due to the increasing data, machine learning focuses on finding the best model for the new data that takes into account all the previous data. Therefore, machine learning research will continue in correlation with this increasing data. This research focuses on the history of machine learning, the methods of machine learning, its applications, and the research that has been conducted on this topic. Our study aims to give researchers a deeper understanding of machine learning, an area of research that is becoming much more popular today, and its applications. Keywords: Machine Learning, Machine Learning Algorithms, Artificial Intelligence, Big Data.

Download Full-text