The Effect of Green Software: A Study of Impact Factors on the Correctness of Software

David Gil; Jose Fernández-Alemán; Juan Trujillo; Ginés García-Mateos; Sergio Luján-Mora; Ambrosio Toval

doi:10.3390/su10103471

The Effect of Green Software: A Study of Impact Factors on the Correctness of Software

Sustainability ◽

10.3390/su10103471 ◽

2018 ◽

Vol 10 (10) ◽

pp. 3471 ◽

Cited By ~ 4

Author(s):

David Gil ◽

Jose Fernández-Alemán ◽

Juan Trujillo ◽

Ginés García-Mateos ◽

Sergio Luján-Mora ◽

...

Keyword(s):

Data Mining ◽

Decision Trees ◽

Learning Process ◽

Confusion Matrix ◽

Computer Software ◽

Impact Factors ◽

Support Vector ◽

Program Correctness ◽

E Learning ◽

Green Software

Unfortunately, sustainability is an issue very poorly used when developing software and hardware systems. Lately, and in order to contribute to the earth sustainability, a new concept emerged named Green software which is computer software that can be developed and used efficiently and effectively with minimal or no impact to the environment. Currently, new teaching methods based on students’ learning process are being developed in the European Higher Education Area. Most of them are oriented to promote students’ interest in the course’s contents and offer personalized feedback. Online judging is a promising method for encouraging students’ participation in the e-learning process, although it still has to be researched and developed to be widely used and in a more efficient way. The great amount of data available in an online judging tool provides the possibility of exploring some of the most indicative attributes (e.g., running time, memory) for learning programming concepts, techniques and languages. So far, the most applied methods for automatically gathering information from the judging systems are based on statistical methods and, although providing reasonable correlations, these methods have not been proven to provide enough information for predicting grades when dealing with a huge amount of data. Therefore, the great novelty of this paper is to develop a data mining approach to predict program correctness as well as the grades of the students’ practices. For this purpose, powerful data mining technologies taken from the artificial intelligence domain have been used. In particular, in this study, we have used logistic regression, decision trees, artificial neural network and support vector machines; which have been properly identified as the most suitable ones for predicting activities in the e-learning domains. The results have achieved an accuracy of around 74%, both in the prediction of the program correctness as well as in the practice grades’ prediction. Another relevant issue provided in this paper is a comparison among these four techniques to obtain the best accuracy in predicting grades based on the availability of data as well as their taxonomy. The Decision Trees classifier has obtained the best confusion matrix, and time and memory efficiency were identified as the most important predictor variables. In view of these results, we can conclude that the development of green software leads programmers to implement correct software.

Download Full-text

Educational data mining in moodle data

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v10i1.pp9-18 ◽

2021 ◽

Vol 10 (1) ◽

pp. 9

Author(s):

Sushil Shrestha ◽

Manish Pokharel

Keyword(s):

Data Mining ◽

Learning Process ◽

Educational Data Mining ◽

Feature Selection Method ◽

Learning System ◽

Significant Feature ◽

Support Vector ◽

Learning Platform ◽

E Learning ◽

Teaching Learning

<p>The main purpose of this research paper is to analyze the moodle data and identify the most influencing features to develop the predictive model. The research applies a wrapper-based feature selection method called Boruta for the selection of best predicting features. Data were collected from eighty-one students who were enrolled in the course called Human Computer Interaction (COMP341), offered by the Department of Computer Science and Engineering at Kathmandu University, Nepal. Kathmandu University uses Moodle as an e-learning platform. The dataset contained eight features where Assignment.Click, Chat.Click, File.Click, Forum.Click, System.Click, Url.Click, and Wiki.Click was used as the independent features and Grade as the dependent feature. Five classification algorithms such as K Nearest Neighbour, Naïve Bayes, and Support Vector Machine (SVM), Random Forest, and CART decision tree were applied in the moodle data. The finding shows that SVM has the highest accuracy in comparison to other algorithms. It suggested that File.Click and System.Click was the most significant feature. This type of research helps in the early identification of students’ performance. The growing popularity of the teaching-learning process through an online learning system has attracted researchers to work in the field of Educational Data Mining (EDM). Varieties of data are generated through several online activities that can be analyzed to understand the student’s performance which helps in the overall teaching-learning process. Academicians especially course instructors who use e-learning platforms for the delivery of the course contents and the learners who use these platforms are highly benefited from this research.</p>

Download Full-text

Design and analysis of an efficient machine learning based hybrid recommendation system with enhanced density-based spatial clustering for digital e-learning applications

Complex & Intelligent Systems ◽

10.1007/s40747-021-00509-4 ◽

2021 ◽

Author(s):

S. Bhaskaran ◽

Raja Marappan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Support Vector Machine ◽

Absolute Error ◽

Support Vector ◽

E Learning ◽

Public Datasets ◽

Hybrid Recommender ◽

New Strategies

AbstractA decision-making system is one of the most important tools in data mining. The data mining field has become a forum where it is necessary to utilize users' interactions, decision-making processes and overall experience. Nowadays, e-learning is indeed a progressive method to provide online education in long-lasting terms, contrasting to the customary head-to-head process of educating with culture. Through e-learning, an ever-increasing number of learners have profited from different programs. Notwithstanding, the highly assorted variety of the students on the internet presents new difficulties to the conservative one-estimate fit-all learning systems, in which a solitary arrangement of learning assets is specified to the learners. The problems and limitations in well-known recommender systems are much variations in the expected absolute error, consuming more query processing time, and providing less accuracy in the final recommendation. The main objectives of this research are the design and analysis of a new transductive support vector machine-based hybrid personalized hybrid recommender for the machine learning public data sets. The learning experience has been achieved through the habits of the learners. This research designs some of the new strategies that are experimented with to improve the performance of a hybrid recommender. The modified one-source denoising approach is designed to preprocess the learner dataset. The modified anarchic society optimization strategy is designed to improve the performance measurements. The enhanced and generalized sequential pattern strategy is proposed to mine the sequential pattern of learners. The enhanced transductive support vector machine is developed to evaluate the extracted habits and interests. These new strategies analyze the confidential rate of learners and provide the best recommendation to the learners. The proposed generalized model is simulated on public datasets for machine learning such as movies, music, books, food, merchandise, healthcare, dating, scholarly paper, and open university learning recommendation. The experimental analysis concludes that the enhanced clustering strategy discovers clusters that are based on random size. The proposed recommendation strategies achieve better significant performance over the methods in terms of expected absolute error, accuracy, ranking score, recall, and precision measurements. The accuracy of the proposed datasets lies between 82 and 98%. The MAE metric lies between 5 and 19.2% for the simulated public datasets. The simulation results prove the proposed generalized recommender has a great strength to improve the quality and performance.

Download Full-text

SVM and Cross-validation using R Studio

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1673.1010120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 46-54

Keyword(s):

Data Mining ◽

Missing Values ◽

Cross Validation ◽

Confusion Matrix ◽

Tourism Industry ◽

Support Vector ◽

Full Dataset ◽

The Matrix ◽

Social Media Platforms ◽

Classification Of Images

Each passing day data is getting multiplied. It is difficult to extract useful information from such big data. Data Mining is used to extract useful information. Data mining is used in majorly all fields like healthcare, marketing, social media platforms and so on. In this paper, data is loaded and preprocessed by dealing with some missing values. The dataset used is of Airbnb, the platform used for lodging and tourism industry. Analyzing the data by plotting correlation using spearman method. Further, applying PCA and Support Vector Machine classification technique on the dataset. There are various applications of SVM, it is used in face-detection, text and hypertext categorization, classification of images, bioinformatics and so on. SVM has high dimensional input space, sparse document vectors and regularization parameters therefore it is appropriate to use SVM. Cross-validation gives more accurate result. The dataset is divided into folds. The end product is the test set which is similar to full dataset. Confusion matrix is evaluated, grid approach is followed for building the matrix at various seeds and kernels (RBF, Polynomial). The aim of this research is to see which is the best kernel for the dataset.

Download Full-text

Application of Data Mining Technology on Surveillance Report Data of HIV/AIDS High-Risk Group in Urumqi from 2009 to 2015

Complexity ◽

10.1155/2018/9193248 ◽

2018 ◽

Vol 2018 ◽

pp. 1-17

Author(s):

Dandan Tang ◽

Man Zhang ◽

Jiabo Xu ◽

Xueliang Zhang ◽

Fang Yang ◽

...

Keyword(s):

Data Mining ◽

High Risk ◽

Diagnostic Accuracy ◽

Random Forests ◽

Prediction Models ◽

Confusion Matrix ◽

Risk Groups ◽

Support Vector ◽

Mining Technology ◽

Hiv Aids

Objective. Urumqi is one of the key areas of HIV/AIDS infection in Xinjiang and in China. The AIDS epidemic is spreading from high-risk groups to the general population, and the situation is still very serious. The goal of this study was to use four data mining algorithms to establish the identification model of HIV infection and compare their predictive performance. Method. The data from the sentinel monitoring data of the three groups of high-risk groups (injecting drug users (IDU), men who have sex with men (MSM), and female sex workers (FSW)) in Urumqi from 2009 to 2015 included demographic characteristics, sex behavior, and serological detection results. Then we used age, marital status, education level, and other variables as input variables and whether to infect HIV as output variables to establish four prediction models for the three datasets. We also used confusion matrix, accuracy, sensitivity, specificity, precision, recall, and the area under the receiver operating characteristic (ROC) curve (AUC) to evaluate classification performance and analyzed the importance of predictive variables. Results. The final experimental results show that random forests algorithm obtains the best results, the diagnostic accuracy for random forests on MSM dataset is 94.4821%, 97.5136% on FSW dataset, and 94.6375% on IDU dataset. The k-nearest neighbors algorithm came out second, with 91.5258% diagnostic accuracy on MSM dataset, 96.3083% diagnostic accuracy on FSW dataset, and 90.8287% diagnostic accuracy on IDU dataset, followed by support vector machine (94.0182%, 98.0369%, and 91.3571%). The decision tree algorithm was the poorest among the four algorithms, with 79.1761% diagnostic accuracy on MSM dataset, 87.0283% diagnostic accuracy on FSW dataset, and 74.3879% accuracy on IDU. Conclusions. Data mining technology, as a new method of assisting disease screening and diagnosis, can help medical personnel to screen and diagnose AIDS rapidly from a large number of information.

Download Full-text

DMiner-I: A software tool of data mining and its applications

Robotica ◽

10.1017/s0263574702004307 ◽

2002 ◽

Vol 20 (5) ◽

pp. 499-508

Author(s):

Jie Yang ◽

Chenzhou Ye ◽

Nianyi Chen

Keyword(s):

Neural Network ◽

Data Mining ◽

Genetic Algorithm ◽

Pattern Recognition ◽

Support Vector Machine ◽

Knowledge Representation ◽

Decision Trees ◽

Software Tool ◽

Support Vector ◽

Function Models

SummaryA software tool for data mining (DMiner-I) is introduced, which integrates pattern recognition (PCA, Fisher, clustering, HyperEnvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), and computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, HyperEnvelop, support vector machine and visualization. The principle, algorithms and knowledge representation of some function models of data mining are described. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining is realized byVisual C++under Windows 2000. The software tool of data mining has been satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

Download Full-text

Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining

Prosiding Seminar Nasional Teknoka ◽

10.22236/teknoka.v5i.391 ◽

2020 ◽

Vol 5 ◽

pp. 19-24

Author(s):

Dyah Retno Utari ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Majority Voting ◽

Support Vector ◽

F Measure

Asuransi kendaraan bermotor merupakan jenis usaha pertanggungan terhadap kerugian atau risiko kerusakan yang dapat timbul dari berbagai macam potensi kejadian yang menimpa kendaraan. Persaingan dalam bisnis asuransi khususnya untuk kendaraan bermotor menuntut inovasi dan strategi agar keberlangsungan bisnis tetap terjamin. Salah satu upaya yang dapat dilakukan perusahaan adalah memprediksi status keberlanjutan polis asuransi kendaraan dengan menganalisis data-data profil dan transaksi nasabah. Prediksi terhadap keputusan pemegang polis menjadi sangat penting bagi perusahaan, karena dapat menentukan strategi pemasaran yang mempengaruhi keputusan pelanggan untuk pembaharuan polis asuransi. Penelitian ini telah mengusulkan suatu model prediksi status keberlanjutan polis asuransi kendaraan dengan teknik pemilihan mayoritas dari hasil klasifikasi menggunakan algoritma- algoritma data mining seperti Naive Bayes, Support Vector Machine dan Decision Tree. Hasil pengujian menggunakan confusion matrix menunjukkan nilai akurasi terbaik diperoleh sebesar 93,57%, apapun untuk nilai precision mencapai 97,20%, dan nilai recall sebesar 95,20% serta nilai F-Measure sebesar 95,30%. Nilai evaluasi model terbaik dihasilkan menggunakan pendekatan pemilihan mayoritas (majority voting), mengungguli kinerja model prediksi berbasis pengklasifikasi tunggal.

Download Full-text

Analisis Kinerja Support Vector Machine dalam Mengidentifikasi Komentar Perundungan pada Jejaring Sosial

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i2.2923 ◽

2021 ◽

Vol 5 (2) ◽

pp. 475

Author(s):

Ade Clinton Sitepu ◽

Wanayumini Wanayumini ◽

Zakarias Situmorang

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Confusion Matrix ◽

Process Research ◽

Training Data ◽

Support Vector ◽

Media Technology ◽

Svm Classification ◽

Svm Algorithm ◽

Using Data

Cyberbullying is the same as bullying but it is done through media technology. Bullying has often occurred along with the development of social media technology in society. Some technique are needed to filter out bully comments because it will indirectly affect the psychological condition of the reader, morover it is aimed at the person concerned. By using data mining techniques, the system is expected to be able to classify information circulating in the community. This research uses the Support Vector Machine (SVM) classification because the algorithm is good at performing the classification process. Research using about 1000 dataset comments. Data are grouped manually first into the labels "bully" and "not bully" then the data divide into training data and test data. To test the system capability, data is analyzed using confusion matrix. The results showed that the SVM Algorithm was able to classify with an level of accuracy 87.75%, 89% precision and 91% Recal. The SVM algorithm is able to formulate training data with level of accuracy 98.3%

Download Full-text

Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive Bayes for off-line analysis

SoutheastCon 2016 ◽

10.1109/secon.2016.7506774 ◽

2016 ◽

Cited By ~ 21

Author(s):

Kathleen Goeschel

Keyword(s):

Data Mining ◽

Support Vector Machines ◽

Intrusion Detection ◽

Decision Trees ◽

Intrusion Detection Systems ◽

Support Vector ◽

Detection Systems ◽

Vector Machines ◽

Using Data ◽

Line Analysis

Download Full-text

Attributes of Low Performing Students In E-Learning System Using Clustering Technique

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1953158 ◽

2019 ◽

pp. 480-485

Author(s):

Ebiemi Allen Ekubo

Keyword(s):

Data Mining ◽

Learning Process ◽

Learning System ◽

Free Access ◽

Prospective Students ◽

Fast Growing ◽

Clustering Technique ◽

Student Data ◽

The Core ◽

E Learning

Data mining in education is considered to be one of the relevant and fast growing areas in data mining, with free access to datasets available online, researchers have continued to analyze and produce knowledge which has improved the educational sector. With many research geared towards predicting student results, this paper offers a different approach of gaining knowledge of student data by presenting the attributes of low-performing students. The idea is to group students with low grades and discover the core attributes of these category of students, thereby providing stakeholders with these attributes which should be looked out for in current and prospective students. The dataset used in this research was collected from an e-Learning system called Kalboard 360. The k-means clustering technique embedded in the WEKA tool was used to group these category of students into two clusters. The knowledge gained from the mining process shows that lower-level absentee students with parents that do not actively participate in their learning process are most likely to perform poorly in their studies.

Download Full-text

Perbandingan Kinerja Hasil Seleksi Fitur pada Prediksi Kinerja Akademik Siswa Berbasis Pohon Keputusan

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v4i2.29294 ◽

2018 ◽

Vol 4 (2) ◽

pp. 84

Author(s):

Achmad Shoddiq Bayu Asmoro ◽

Wahyu Sakti Gunawan Irianto ◽

Utomo Pujianto

Keyword(s):

Data Mining ◽

Feature Selection ◽

Cross Validation ◽

Information Gain ◽

Confusion Matrix ◽

Model Data ◽

E Learning ◽

Correlation Based Feature Selection

Sistem manajemen E-learning merupakan bentuk kemajuan teknologi dalam bidang pendidikan dan telah banyak menghasilkan kumpulan data-data pendidikan yang salah satunya adalah data aktivitas pembelajaran siswa dalam sistem manajemen E-learning. Banyaknya data pendidikan yang belum tereksplorasi dengan baik dapat di manfaatkan dengan menggunakan teknik data mining. Pada penelitian ini akan dilakukan perbandingan 3 model data berbeda yaitu data awal tanpa preprocessing dan data yang di preprocessing menggunakan seleksi fitur correlation-based feature selection dan Information Gain. Data yang digunakan adalah data aktivitas pembelajaran siswa dalam sistem manajemen E-learning. Selanjutnya proses pengujian data dengan menggunakan 10 folds cross validation dengan metode C4.5 dan evaluasi data menggunakan confusion matrix. Hasil dari pengujian data menggunakan algoritma C4.5 yang dikombinasikan dengan seleksi fitur correlation-based feature selection menghasilkan nilai akurasi yang lebih tinggi dengan nilai akurasi sebesar 76.92%. Sementara itu hasil dari pengujian data awal tanpa selesksi fitur dan data yang di seleksi fitur menggunakan information gain memiliki nilai akrasi yang sama dengan nilai akurasi sebesar 76.19%. Hal ini dikarenakan data yang diproses menggunakan algoritma C4.5 tanpa preprocessing dan data yang telah di preprocessing menggunakan information gain sama-sama menghitung nilai gain untuk membuat model pohon keputusan, dan menghasilkan model pohon keputusan yang sama. Sehingga hasil dari proses pengujian data memiliki nilai akurasi yang sama.

Download Full-text