International Journal on Data Science

Classification of Biomedical Literature in Hypertension and Diabetes

International Journal on Data Science ◽

10.18517/ijods.1.2.114-119.2020 ◽

2020 ◽

Vol 1 (2) ◽

pp. 114-119

Author(s):

Nur Aniq Syafiq Rodzuan ◽

Shahreen Kasim ◽

Mohanavali Sithambranathan ◽

Muhammad Zaki Hassan

Keyword(s):

Text Mining ◽

New Technology ◽

Biomedical Literature ◽

Text Documents ◽

Textual Databases ◽

The Difference ◽

Classification Evaluation ◽

Linguistic Approaches ◽

Clear Information

Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to understand. To extract this kind of information, text mining was introduced as new technology. Text mining is the process of extracting non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order to draw this comparison, RStudio, a statistical-based tool and TerMine, a linguistic-based tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation using Naïve Bayes classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approaches using these tools. Experimental results show the result of the comparison and the difference between both tools in executing extraction keywords.

Download Full-text

Regression Model to Analyse Air Pollutants Over a Coastal Industrial Station Visakhapatnam ( India )

International Journal on Data Science ◽

10.18517/ijods.1.2.107-113.2020 ◽

2020 ◽

Vol 1 (2) ◽

pp. 107-113

Author(s):

N.V. Krishna Prasad ◽

M.S.S.R.K.N. Sarma ◽

P. Sasikala ◽

Naga Raju M ◽

N. Madhavi

Keyword(s):

Air Pollution ◽

Particulate Matter ◽

Relative Humidity ◽

Regression Model ◽

Vital Role ◽

Measuring Instruments ◽

The Sustainable Development ◽

Independent Variables ◽

Particulate Matter Concentration ◽

Matter Concentration

Particulate matter concentration and its study has gained tremendous significance in view of increase in air pollution. Since air pollution has many adverse effects on mankind, measures may be taken by observing the trends in PM2.5 (particulate matter) and concentrations of pollutants like NO2, SO2, NO2, NO, NOx, CO, NH3 and RH(Relative Humidity) as well as temperature. Even though continuous monitoring of air pollution in urban locations has been increasing in view of its huge impact on the sustainable development and ecological balance a regression model is essential always to analyse large sets of data. These regression models also play vital role in some cases where data was not observed due to unavoidable circumstances and during times when the measuring instruments do not work. In this context an attempt was made to develop a regression model exclusively for Visakhapatnam(India) a coastal, urban and industrial station and to analyse the trends in particulate matter concentration at this staion. A regression model was developed with PM2.5 as dependent variable and SO2, NOx, NO2, CO, NH3, temperature(Temp) and relative humidity(RH) as independent variables. The efficiency of the model was tested with known independent variables and PM2.5 was estimated. It is found that observed and estimated PM2.5 values are highly correlated.

Download Full-text

Using Big Data Analytics for Decision Making: Analyzing Customer Behavior using Association Rule Mining in a Gold, Silver, and Precious Metal Trading Company in Indonesia

International Journal on Data Science ◽

10.18517/ijods.1.2.57-71.2020 ◽

2020 ◽

Vol 1 (2) ◽

pp. 57-71

Author(s):

Wecka Imam Yudhistyra ◽

Evri Marta Risal ◽

I-soon Raungratanaamporn ◽

Vatanavongs Ratanavaraha

Keyword(s):

Big Data ◽

Precious Metal ◽

Data Analytics ◽

Industrial Revolution ◽

Big Data Analytics ◽

Trading Company ◽

Business Goals ◽

Gold Silver ◽

Key Steps ◽

A Company

Indonesia is facing many challenges in the fourth industrial revolution (4IR) era. One of them is related to big data technologies and implementation that can be seen clearly from Indonesia Industry Readiness Index (INI) 4.0. Therefore, focusing on implementing big data analytics in a gold, silver, and precious metal trading company is the objective of this manuscript to support daily business operations. To be more specific, the aim is to discover meaningful patterns and ensure high quality of knowledge discovery from the big data available in a company in Indonesia. It is needed to support the Making Indonesia 4.0 as a roadmap to implement industrial digitalization in Indonesia. The methodology used for the big data implementation in this manuscript is the combination of the CRISP-DM framework and key steps for customer analytics. The result of this research is a list of recommendations that facilitate strategic planning based on evidence of measurable big data analytics to achieve the business goals of a company.

Download Full-text

Optimization Audicor for Normal and Abnormal Heart Sounds Characteristic

International Journal on Data Science ◽

10.18517/ijods.1.2.99-106.2020 ◽

2020 ◽

Vol 1 (2) ◽

pp. 99-106

Author(s):

Dedi Kurniadi ◽

Surfa Yondri ◽

Albar ◽

Roza Susanti ◽

David Eka Putra ◽

...

Keyword(s):

Signal Processing ◽

Feature Extraction ◽

Data Analysis ◽

Ventricular Septal Defect ◽

Human Body ◽

Septal Defect ◽

Heart Sound ◽

Heart Sounds ◽

Normal Case ◽

Recording Process

Heart Sounds are important things in the human body that can deliver information related to the heart condition. However, a recorded signal such as PCG and ECG that getting through Audicor still contain unexpected components or noise while the recording process happens it makes the result data from Audicor cannot directly use to recognize the condition of the heart. This research presents signal processing and data analysis to suppress the noise of the heart sounds that getting while the process of recording data happens. The cleaned heart sound will be processed in feature extraction by using FFT and PCA that capable to produce the feature both of the normal and abnormal heart sounds. For the normal case, we get the data from some healthy volunteers recorded by using Audicor. While the abnormal heart sound we focus to observe the data that contain Ventricular Septal Defect (VSD) that getting from a partner hospital. As a result, feature both normal and abnormal heart sounds can be separated.

Download Full-text

Improvement of Traditional Protection System in the Existing Hybrid Microgrid with Advanced Intelligent Method

International Journal on Data Science ◽

10.18517/ijods.1.2.72-81.2020 ◽

2020 ◽

Vol 1 (2) ◽

pp. 72-81

Author(s):

Pooja Khandare ◽

Sanjay Deokar ◽

Aarti Dixit

Keyword(s):

Operating Time ◽

Protection System ◽

Double Line ◽

Power Sources ◽

Smart System ◽

Hybrid Microgrid ◽

Ground Fault ◽

Current Duration ◽

Developing System ◽

Sustainable Power

The sustainable power sources will turn out to be a long haul future arrangement over conventional existing accessible source. Numerous specialized difficulties are ascending in the developing system of the Indian Microgrid; protection is one of it. Traditional protection framework utilized directional overcurrent transfers for the assurance against different deficiencies. These are dependent on the presupposition of the unidirectional progression of Current. The similar strategy cannot matter to the microgrid security as a bidirectional progression of current streams because of the nearness of DGs, which prompts specialized difficulties in the Microgrid. The new advanced intelligent method of DWT-differential Algorithm proposed over traditional protection system. The various parameters such as PS (Plug setting), TS (time setting), CDS(current duration setting) and operating time of relay tested. Indian Microgrid concentrated as contextual analysis, and smart system of microgrid protection prescribed to defeat all challenges got from writing contemplated. The system is designed in MATLAB software ,Double line to ground fault is simulated at different locations and DWT-DA is applied.The percentage of reduction obtained in operating time and other parameters are results in increasing reliability of microgrid.

Download Full-text

Application and Optimization of MIMO Communication in Wide Area Monitoring Systems

International Journal on Data Science ◽

10.18517/ijods.1.2.82-98.2020 ◽

2020 ◽

Vol 1 (2) ◽

pp. 82-98

Author(s):

Abdelmadjid Recioui ◽

Youcef Grainat

Keyword(s):

Communication System ◽

Data Transfer ◽

Multiple Input Multiple Output ◽

System Optimization ◽

Wide Area ◽

Monitoring Systems ◽

Wide Area Monitoring ◽

Input Multiple Output ◽

Area Monitoring ◽

Mimo Technology

Multiple-Input Multiple-Output (MIMO) technology uses a multitude of antennas at both transmitter and receiver to transfer a larger data mount simultaneously. It is the key technology in the 4th and 5th communication system generations. In this work, the use of MIMO technology to enhance the data transfer in terms of completeness, correctness and latency in Wide Area Monitoring Systems (WAMS) is envisaged. To further enhance the system, optimization is done to design the communication system in terms of physical layout. A comparison with the state of art technologies is done to highlight how the adoption of the MIMO technology would enhance the data transfer within the smart grid.

Download Full-text

Protein Structure Prediction Using Robust Principal Component Analysis and Support Vector Machine

International Journal on Data Science ◽

10.18517/ijods.1.1.14-17.2020 ◽

2020 ◽

Vol 1 (1) ◽

pp. 14-17

Author(s):

Nur Aini Zakaria ◽

Zuraini Ali Shah ◽

Shahreen Kasim

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Secondary Structure Prediction ◽

Principal Component ◽

Training Dataset ◽

Support Vector ◽

Testing Dataset ◽

Prediction Function ◽

Rbf Kernel

Existence of bioinformatics is to increase the further understanding of biological process. Proteins structure is one of the major challenges in structural bioinformatics. With former knowledge of the structure, the quality of secondary structure, prediction of tertiary structure, and prediction function of amino acid from its sequence increase significantly. Recently, the gap between sequence known and structure known proteins had increase dramatically. So it is compulsory to understand on proteins structure to overcome this problem so further functional analysis could be easier. The research applying RPCA algorithm to extract the essential features from the original high-dimensional input vectors. Then the process followed by experimenting SVM with RBF kernel. The proposed method obtains accuracy by 84.41% for training dataset and 89.09% for testing dataset. The result then compared with the same method but PCA was applied as the feature extraction. The prediction assessment is conducted by analyzing the accuracy and number of principal component selected. It shows that combination of RPCA and SVM produce a high quality classification of protein structure

Download Full-text

Identification of Gene of Melanoma Skin Cancer Using Clustering Algorithms

International Journal on Data Science ◽

10.18517/ijods.1.1.51-56.2020 ◽

2020 ◽

Vol 1 (1) ◽

pp. 51-56

Author(s):

Mohanavali Sithambranathan ◽

Shahreen Kasim ◽

Muhammad Zaki Hassan ◽

Nur Aniq Syafiq Rodzuan

Keyword(s):

Support Vector Machine ◽

Skin Cancer ◽

Human Body ◽

Clustering Algorithms ◽

Large Data ◽

Support Vector ◽

Proper Treatment ◽

Cancer Disease ◽

Redundant Data ◽

Melanoma Skin

The Melanoma is the deadliest skin cancer. It can be developed in any parts of the human body. The cancer disease can be cured if it is diagnosed early and proper treatment is taken. In cancer classification, there is a problem in handling the large data of cancer. Large data contains meaningless data and redundant data. Therefore, to overcome the problem, many computer approaches for classification have been proposed in the previous literature. This time, the clustering process for melanoma is conducted using Support Vector Machine and K-Means. Therefore, the purpose of this research is to identify and evaluate the performance of the accuracy of genes that contain melanoma skin cancer using the clustering algorithms.

Download Full-text

Evaluate the Performance of SVM Kernel Functions for Multiclass Cancer Classification

International Journal on Data Science ◽

10.18517/ijods.1.1.37-41.2020 ◽

2020 ◽

Vol 1 (1) ◽

pp. 37-41

Author(s):

Noramalina Mohd Hatta ◽

Zuraini Ali Shah ◽

Shahreen Kasim

Keyword(s):

Kernel Function ◽

Supervised Classification ◽

Kernel Functions ◽

Cancer Classification ◽

Polynomial Kernel ◽

Support Vector ◽

Linear Kernel ◽

Straight Line ◽

Rbf Kernel ◽

Rbf Kernel Function

Multiclass cancer classification is basically one of the challenging fields in machine learning which a fast growing technology that use human behaviour as examples. Supervised classification such Support Vector Machine (SVM) has been used to classify the dataset on classification by its own function and merely known as kernel function. Kernel function has stated to have a problem especially in selecting their best kernels based on a specific datasets and tasks. Besides, there is an issue stated that the kernels function have a high impossibility to distribute the data in straight line. Here, three basic kernel functions was used and tested with selected dataset and they are linear kernel, polynomial kernel and Radial Basis Function (RBF) kernel function. The three kernels were tested by different dataset to gain the accuracy. For a comparison, this study conducting a test by with and without feature selection in SVM classification kernel function since both tests will give different result and thus give a big meaning to the study.

Download Full-text

Classification Breast Cancer Revisited with Machine Learning

International Journal on Data Science ◽

10.18517/ijods.1.1.42-50.2020 ◽

2020 ◽

Vol 1 (1) ◽

pp. 42-50

Author(s):

Hanna Arini Parhusip ◽

Bambang Susanto ◽

Lilik Linawati ◽

Suryasatriya Trihandaru ◽

Yohanes Sardjono ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

K Nearest Neighbor ◽

Cancer Data

The article presents the study of several machine learning algorithms that are used to study breast cancer data with 33 features from 569 samples. The purpose of this research is to investigate the best algorithm for classification of breast cancer. The data may have different scales with different large range one to the other features and hence the data are transformed before the data are classified. The used classification methods in machine learning are logistic regression, k-nearest neighbor, Naive bayes classifier, support vector machine, decision tree and random forest algorithm. The original data and the transformed data are classified with size of data test is 0.3. The SVM and Naive Bayes algorithms have no improvement of accuracy with random forest gives the best accuracy among all. Therefore the size of data test is reduced to 0.25 leading to improve all algorithms in transformed data classifications. However, random forest algorithm still gives the best accuracy.

Download Full-text

International Journal on Data Science
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Insight Society

Classification of Biomedical Literature in Hypertension and Diabetes

Regression Model to Analyse Air Pollutants Over a Coastal Industrial Station Visakhapatnam ( India )

Using Big Data Analytics for Decision Making: Analyzing Customer Behavior using Association Rule Mining in a Gold, Silver, and Precious Metal Trading Company in Indonesia

Optimization Audicor for Normal and Abnormal Heart Sounds Characteristic

Improvement of Traditional Protection System in the Existing Hybrid Microgrid with Advanced Intelligent Method

Application and Optimization of MIMO Communication in Wide Area Monitoring Systems

Protein Structure Prediction Using Robust Principal Component Analysis and Support Vector Machine

Identification of Gene of Melanoma Skin Cancer Using Clustering Algorithms

Evaluate the Performance of SVM Kernel Functions for Multiclass Cancer Classification

Classification Breast Cancer Revisited with Machine Learning

Export Citation Format

International Journal on Data ScienceLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Insight Society

Classification of Biomedical Literature in Hypertension and Diabetes

Regression Model to Analyse Air Pollutants Over a Coastal Industrial Station Visakhapatnam ( India )

Using Big Data Analytics for Decision Making: Analyzing Customer Behavior using Association Rule Mining in a Gold, Silver, and Precious Metal Trading Company in Indonesia

Optimization Audicor for Normal and Abnormal Heart Sounds Characteristic

Improvement of Traditional Protection System in the Existing Hybrid Microgrid with Advanced Intelligent Method

Application and Optimization of MIMO Communication in Wide Area Monitoring Systems

Protein Structure Prediction Using Robust Principal Component Analysis and Support Vector Machine

Identification of Gene of Melanoma Skin Cancer Using Clustering Algorithms

Evaluate the Performance of SVM Kernel Functions for Multiclass Cancer Classification

Classification Breast Cancer Revisited with Machine Learning

International Journal on Data Science
Latest Publications