Parallel clustering of large data set on Hadoop using data mining techniques

Author(s):  
Kaustubh S. Chaturbhuj ◽  
Gauri Chaudhary
2013 ◽  
Vol 5 (1) ◽  
pp. 66-83 ◽  
Author(s):  
Iman Rahimi ◽  
Reza Behmanesh ◽  
Rosnah Mohd. Yusuff

The objective of this article is an evaluation and assessment efficiency of the poultry meat farm as a case study with the new method. As it is clear poultry farm industry is one of the most important sub- sectors in comparison to other ones. The purpose of this study is the prediction and assessment efficiency of poultry farms as decision making units (DMUs). Although, several methods have been proposed for solving this problem, the authors strongly need a methodology to discriminate performance powerfully. Their methodology is comprised of data envelopment analysis and some data mining techniques same as artificial neural network (ANN), decision tree (DT), and cluster analysis (CA). As a case study, data for the analysis were collected from 22 poultry companies in Iran. Moreover, due to a small data set and because of the fact that the authors must use large data set for applying data mining techniques, they employed k-fold cross validation method to validate the authors’ model. After assessing efficiency for each DMU and clustering them, followed by applied model and after presenting decision rules, results in precise and accurate optimizing technique.


Author(s):  
Alfonso Capozzoli ◽  
Gianluca Serale ◽  
Marco Savino Piscitelli ◽  
Daniele Grassi

Author(s):  
Deepti Aggarwal ◽  
Sonu Mittal ◽  
Vikram Bali

The educational institutes are focusing on improving the performance of students by using several data mining techniques. Since there is an increase in the number of drop out students every year, if we are able to predict whether a student will complete the course or not, it is possible to take some preventive actions beforehand. The primary data set used for modelling has been taken from a reputed technical institute of Uttar Pradesh which consists of data of 6,807 students containing 20 academic and non-academic attributes. The most relevant attributes are extracted using CorrelationAttributeEval (in WEKA) technique using Ranker search method which ranks the attributes as per their evaluation. Synthetic minority oversampling technique (SMOTE) filter is applied to deal with the skewed data set. The models are built from eight classifiers that are analysed for predicting the most appropriate model to classify whether a student will complete the course or withdraw his/her admission.


Author(s):  
Anindita Desarkar ◽  
Ajanta Das

Huge amount of data is generated from Healthcare transactions where data are complex, voluminous and heterogeneous in nature. This large dataset can be used as an ideal store which can be analyzed for knowledge discovery as well as various future predictions. So, Data mining is becoming increasingly popular as it offers set of innovative tools and techniques to handle this kind of data set whereas traditional methods have limitations for that. In summary, providing the better patient care and reduction in healthcare cost are two major goals of application of data mining in healthcare. Initially, this chapter explores on the various types of eHealth data and its characteristics. Subsequently it explores various domains in healthcare sector and shows how data mining plays a major role in those domains. Finally, it describes few common data mining techniques and their applications in eHealth domain.


Author(s):  
SUSHIL VERMA ◽  
R. S. THAKUR ◽  
SHAILESH JALORI

Data mining is used to extract meaningful information and to develop significant relationships among variables stored in large data set. Few years ago, the information flow in education field was relatively simple and the application of technology was limited. However, as we progress into a more integrated world where technology has become an integral part of the business processes, the process of transfer of information has become more complicated. Today, one of the biggest challenges that educational institutions face is the explosive growth of educational data and to use this data to improve the quality of managerial decisions and student’s performance. The main objective of higher education institutions is to provide quality education to its students. One way to achieve highest level of quality in higher education system is by discovering knowledge for prediction regarding enrolment of students in a particular course, alienation of traditional classroom teaching model, detection of Unfair means used in online examination, detection of abnormal values in the result sheets of the students, prediction about students’ performance. The paper aims to purpose the use of Data mining techniques to improve the efficiency of higher educational institutions. If data mining techniques such as clustering, dicision tree and association can be applied to higher education processes, it can help improve student’s performance.


2019 ◽  
Vol 32 (4) ◽  
pp. 1523-1538 ◽  
Author(s):  
Sérgio Moro ◽  
Joaquim Esmerado ◽  
Pedro Ramos ◽  
Bráulio Alturas

Purpose This paper aims to propose a data mining approach to evaluate a conceptual model in tourism, encompassing a large data set characterized by dimensions grounded on existing literature. Design/methodology/approach The approach is tested using a guest satisfaction model encompassing nine dimensions. A large data set of 84 k online reviews and 31 features was collected from TripAdvisor. The review score granted was considered a proxy of guest satisfaction and was defined as the target feature to model. A sequence of data understanding and preparation tasks led to a tuned set of 60k reviews and 29 input features which were used for training the data mining model. Finally, the data-based sensitivity analysis was adopted to understand which dimensions most influence guest satisfaction. Findings Previous user’s experience with the online platform, individual preferences, and hotel prestige were the most relevant dimensions concerning guests’ satisfaction. On the opposite, homogeneous characteristics among the Las Vegas hotels such as the hotel size were found of little relevance to satisfaction. Originality/value This study intends to set a baseline for an easier adoption of data mining to evaluate conceptual models through a scalable approach, helping to bridge between theory and practice, especially relevant when dealing with Big Data sources such as the social media. Thus, the steps undertaken during the study are detailed to facilitate replication to other models.


2018 ◽  
Vol 7 (2.7) ◽  
pp. 917
Author(s):  
Madhuri Kommineni ◽  
Someswari Perla ◽  
Divya Bharathi Yedla

Data Mining is a technique which focuses on large data sets to extract information for prediction and discovery of hidden patterns.  Data Mining is applicable on various areas like healthcare, insurance, marketing, retail, communication, agriculture. Agriculture is the backbone of country’s economy. It is the important source of livelihood. Agriculture mainly depends on climate, topography, soil, biology. Agricultural Mining is a technology which can bring knowledge to agriculture development. Data Mining in agriculture plays a role in weather forecasting, yield prediction, soil fertility, fertilizers usage, fruit grading, plant disease and weed detection. The current study presents the different data mining techniques and their role in context of soil fertility, nutrient analysis. 


Author(s):  
Sujata Mulik

Agriculture sector in India is facing rigorous problem to maximize crop productivity. More than 60 percent of the crop still depends on climatic factors like rainfall, temperature, humidity. This paper discusses the use of various Data Mining applications in agriculture sector. Data Mining is used to solve various problems in agriculture sector. It can be used it to solve yield prediction.  The problem of yield prediction is a major problem that remains to be solved based on available data. Data mining techniques are the better choices for this purpose. Different Data Mining techniques are used and evaluated in agriculture for estimating the future year's crop production. In this paper we have focused on predicting crop yield productivity of kharif & Rabi Crops. 


Sign in / Sign up

Export Citation Format

Share Document