International Journal on Data Science
Latest Publications


TOTAL DOCUMENTS

12
(FIVE YEARS 0)

H-INDEX

0
(FIVE YEARS 0)

Published By Insight Society

2722-2039

2020 ◽  
Vol 1 (2) ◽  
pp. 114-119
Author(s):  
Nur Aniq Syafiq Rodzuan ◽  
Shahreen Kasim ◽  
Mohanavali Sithambranathan ◽  
Muhammad Zaki Hassan

Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to understand. To extract this kind of information, text mining was introduced as new technology. Text mining is the process of extracting non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order to draw this comparison, RStudio, a statistical-based tool and TerMine, a linguistic-based tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation using Naïve Bayes classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approaches using these tools. Experimental results show the result of the comparison and the difference between both tools in executing extraction keywords.


2020 ◽  
Vol 1 (2) ◽  
pp. 107-113
Author(s):  
N.V. Krishna Prasad ◽  
M.S.S.R.K.N. Sarma ◽  
P. Sasikala ◽  
Naga Raju M ◽  
N. Madhavi

Particulate matter concentration and its study has gained tremendous significance in view of increase in air pollution. Since air pollution has many adverse effects on mankind, measures may be taken by observing the trends in PM2.5 (particulate matter) and concentrations of pollutants like NO2, SO2, NO2, NO, NOx, CO, NH3 and RH(Relative Humidity)  as well as temperature. Even though continuous monitoring of air pollution in urban locations has been increasing in view of its huge impact on the sustainable development and ecological balance a regression model is essential always to analyse large sets of data. These regression models also play vital role in some cases where data was not observed due to unavoidable circumstances and during times when the measuring instruments do not work. In this context an attempt was made to develop a regression model exclusively for Visakhapatnam(India) a coastal, urban and industrial station and to analyse the trends in particulate matter concentration at this staion. A regression model was developed with PM2.5 as dependent variable and SO2, NOx, NO2, CO, NH3, temperature(Temp) and relative humidity(RH) as independent variables. The efficiency of the model was tested with known independent variables and PM2.5 was estimated. It is found that observed and estimated PM2.5 values are highly correlated.


2020 ◽  
Vol 1 (2) ◽  
pp. 57-71
Author(s):  
Wecka Imam Yudhistyra ◽  
Evri Marta Risal ◽  
I-soon Raungratanaamporn ◽  
Vatanavongs Ratanavaraha

Indonesia is facing many challenges in the fourth industrial revolution (4IR) era. One of them is related to big data technologies and implementation that can be seen clearly from Indonesia Industry Readiness Index (INI) 4.0. Therefore, focusing on implementing big data analytics in a gold, silver, and precious metal trading company is the objective of this manuscript to support daily business operations. To be more specific, the aim is to discover meaningful patterns and ensure high quality of knowledge discovery from the big data available in a company in Indonesia. It is needed to support the Making Indonesia 4.0 as a roadmap to implement industrial digitalization in Indonesia. The methodology used for the big data implementation in this manuscript is the combination of the CRISP-DM framework and key steps for customer analytics. The result of this research is a list of recommendations that facilitate strategic planning based on evidence of measurable big data analytics to achieve the business goals of a company.


2020 ◽  
Vol 1 (2) ◽  
pp. 99-106
Author(s):  
Dedi Kurniadi ◽  
Surfa Yondri ◽  
Albar ◽  
Roza Susanti ◽  
David Eka Putra ◽  
...  

Heart Sounds are important things in the human body that can deliver information related to the heart condition. However, a recorded signal such as PCG and ECG that getting through Audicor still contain unexpected components or noise while the recording process happens it makes the result data from Audicor cannot directly use to recognize the condition of the heart. This research presents signal processing and data analysis to suppress the noise of the heart sounds that getting while the process of recording data happens. The cleaned heart sound will be processed in feature extraction by using FFT and PCA that capable to produce the feature both of the normal and abnormal heart sounds. For the normal case, we get the data from some healthy volunteers recorded by using Audicor. While the abnormal heart sound we focus to observe the data that contain Ventricular Septal Defect (VSD) that getting from a partner hospital.  As a result, feature both normal and abnormal heart sounds can be separated.


2020 ◽  
Vol 1 (2) ◽  
pp. 72-81
Author(s):  
Pooja Khandare ◽  
Sanjay Deokar ◽  
Aarti Dixit

The sustainable power sources will turn out to be a long haul future arrangement over conventional existing accessible source. Numerous specialized difficulties are ascending in the developing system of the Indian Microgrid; protection is one of it. Traditional protection framework utilized directional overcurrent transfers for the assurance against different deficiencies. These are dependent on the presupposition of the unidirectional progression of Current. The similar strategy cannot matter to the microgrid security as a bidirectional progression of current streams because of the nearness of DGs, which prompts specialized difficulties in the Microgrid. The new advanced intelligent method of DWT-differential Algorithm proposed over traditional protection system. The various parameters such as PS (Plug setting), TS (time setting), CDS(current duration setting) and operating time of relay tested. Indian Microgrid concentrated as contextual analysis, and smart system of microgrid protection prescribed to defeat all challenges got from writing contemplated. The system is designed in MATLAB software ,Double line to ground fault is simulated at different locations and DWT-DA is applied.The percentage of reduction obtained in operating time and other parameters are results in increasing reliability of microgrid.


2020 ◽  
Vol 1 (2) ◽  
pp. 82-98
Author(s):  
Abdelmadjid Recioui ◽  
Youcef Grainat

Multiple-Input Multiple-Output (MIMO) technology uses a multitude of antennas at both transmitter and receiver to transfer a larger data mount simultaneously. It is the key technology in the 4th and 5th communication system generations. In this work, the use of MIMO technology to enhance the data transfer in terms of completeness, correctness and latency in Wide Area Monitoring Systems (WAMS) is envisaged. To further enhance the system, optimization is done to design the communication system in terms of physical layout.  A comparison with the state of art technologies is done to highlight how the adoption of the MIMO technology would enhance the data transfer within the smart grid.


2020 ◽  
Vol 1 (1) ◽  
pp. 14-17
Author(s):  
Nur Aini Zakaria ◽  
Zuraini Ali Shah ◽  
Shahreen Kasim

Existence of bioinformatics is to increase the further understanding of biological process. Proteins structure is one of the major challenges in structural bioinformatics. With former knowledge of the structure, the quality of secondary structure, prediction of tertiary structure, and prediction function of amino acid from its sequence increase significantly. Recently, the gap between sequence known and structure known proteins had increase dramatically. So it is compulsory to understand on proteins structure to overcome this problem so further functional analysis could be easier. The research applying RPCA algorithm to extract the essential features from the original high-dimensional input vectors. Then the process followed by experimenting SVM with RBF kernel. The proposed method obtains accuracy by 84.41% for training dataset and 89.09% for testing dataset. The result then compared with the same method but PCA was applied as the feature extraction. The prediction assessment is conducted by analyzing the accuracy and number of principal component selected. It shows that combination of RPCA and SVM produce a high quality classification of protein structure


2020 ◽  
Vol 1 (1) ◽  
pp. 51-56
Author(s):  
Mohanavali Sithambranathan ◽  
Shahreen Kasim ◽  
Muhammad Zaki Hassan ◽  
Nur Aniq Syafiq Rodzuan

The Melanoma is the deadliest skin cancer. It can be developed in any parts of the human body. The cancer disease can be cured if it is diagnosed early and proper treatment is taken. In cancer classification, there is a problem in handling the large data of cancer. Large data contains meaningless data and redundant data. Therefore, to overcome the problem, many computer approaches for classification have been proposed in the previous literature. This time, the clustering process for melanoma is conducted using Support Vector Machine and K-Means. Therefore, the purpose of this research is to identify and evaluate the performance of the accuracy of genes that contain melanoma skin cancer using the clustering algorithms.


2020 ◽  
Vol 1 (1) ◽  
pp. 37-41
Author(s):  
Noramalina Mohd Hatta ◽  
Zuraini Ali Shah ◽  
Shahreen Kasim

Multiclass cancer classification is basically one of the challenging fields in machine learning which a fast growing technology that use human behaviour as examples. Supervised classification such Support Vector Machine (SVM) has been used to classify the dataset on classification by its own function and merely known as kernel function. Kernel function has stated to have a problem especially in selecting their best kernels based on a specific datasets and tasks. Besides, there is an issue stated that the kernels function have a high impossibility to distribute the data in straight line. Here, three basic kernel functions was used and tested with selected dataset and they are linear kernel, polynomial kernel and Radial Basis Function (RBF) kernel function. The three kernels were tested by different dataset to gain the accuracy. For a comparison, this study conducting a test by with and without feature selection in SVM classification kernel function since both tests will give different result and thus give a big meaning to the study.


2020 ◽  
Vol 1 (1) ◽  
pp. 42-50
Author(s):  
Hanna Arini Parhusip ◽  
Bambang Susanto ◽  
Lilik Linawati ◽  
Suryasatriya Trihandaru ◽  
Yohanes Sardjono ◽  
...  

The article presents the study of several machine learning algorithms that are used to study breast cancer data with 33 features from 569 samples. The purpose of this research is to investigate the best algorithm for classification of breast cancer. The data may have different scales with different large range one to the other features and hence the data are transformed before the data are classified. The used classification methods in machine learning are logistic regression, k-nearest neighbor, Naive bayes classifier, support vector machine, decision tree and random forest algorithm. The original data and the transformed data are classified with size of data test is 0.3. The SVM and Naive Bayes algorithms have no improvement of accuracy with random forest gives the best accuracy among all. Therefore the size of data test is reduced to 0.25 leading to improve all algorithms in transformed data classifications. However, random forest algorithm still gives the best accuracy.


Sign in / Sign up

Export Citation Format

Share Document