Connected Component and Morphology Based Extraction of Arterial Centerlines of the Heart (CocomoBeach)

Mapping Intimacies ◽

10.54294/cbngt2 ◽

2008 ◽

Author(s):

Pieter Kitslaar ◽

Michel Frenay ◽

Elco Oost ◽

Jouke Dijkstra ◽

Berend Stoel ◽

...

Keyword(s):

Coronary Artery ◽

Training Data ◽

Data Sets ◽

Fast Marching ◽

Geometrical Properties ◽

Testing Data ◽

Heart Region ◽

Extraction Scheme ◽

The Right ◽

Coronary Artery Tree

This document describes a novel scheme for the automated extraction of the central lumen lines of coronary arteries from computed tomography angiography (CTA) data. The scheme ﬁrst obtains a seg- mentation of the whole coronary tree and subsequently extracts the centerlines from this segmentation. The ﬁrst steps of the segmentation algorithm consist of the detection of the aorta and the entire heart region. Next, candidate coronary artery components are detected in the heart region after the masking of the cardiac blood pools. Based on their location and geometrical properties the structures representing the right and left arterties are selected from the candidate list. Starting from the aorta, connections between these structures are made resulting in a ﬁnal segmentation of the whole coronary artery tree, A fast-marching level set method combined with a backtracking algorithm is employed to obtain the initial centerlines within this segmentation. For all vessels a curved multiplanar reformatted image (CMPR) is constructed and used to detect the lumen contours. The ﬁnal centerline was then deﬁned by determining the center of gravity of the detected lumen in the transversal CMPR slices. Within the scope of the MICCAI Challenge “Coronary Artery Tracking 2008”, the coronary tree segmentation and centerline extraction scheme was used to automatically detect a set of centerlines in 24 datasets. For 8 data sets reference centerlines were available. This training data was used during the development and tuning of the algorithm. Sixteen other data sets were provided as testing data. Evaluation of the proposed methodology was performed through submission of the resulting centerlines to the MICCAI Challenge website

Download Full-text

PREDICTING TELECOMMUNICATION TOWER COSTS USING FUZZY SUBTRACTIVE CLUSTERING

Journal of Civil Engineering and Management ◽

10.3846/13923730.2013.802736 ◽

2014 ◽

Vol 21 (1) ◽

pp. 67-74 ◽

Cited By ~ 3

Author(s):

Mohamed Marzouk ◽

Mohamed Alaraby

Keyword(s):

Model Performance ◽

Training Data ◽

Percentage Error ◽

Subtractive Clustering ◽

Data Sets ◽

Test Model ◽

First Order ◽

Testing Data ◽

Telecommunication Towers ◽

Tower Height

This paper presents a fuzzy subtractive modelling technique to predict the weight of telecommunication towers which is used to estimate their respective costs. This is implemented through the utilization of data from previously installed telecommunication towers considering four input parameters: a) tower height; b) allowed tilt or deflection; c) antenna subjected area loading; and d) wind load. Telecommunication towers are classified according to designated code (TIA-222-F and TIA-222-G standards) and structures type (Self-Supporting Tower (SST) and Roof Top (RT)). As such, four fuzzy subtractive models are developed to represent the four classes. To build the fuzzy models, 90% of data are utilized and fed to Matlab software as training data. The remaining 10% of the data are utilized to test model performance. Sugeno-Type first order is used to optimize model performance in predicting tower weights. Errors are estimated using Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) for both training and testing data sets. Sensitivity analysis is carried to validate the model and observe the effect of clusters’ radius on models performance.

Download Full-text

Skin Cancer Detection using CNN Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1079.089620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 45-49

Keyword(s):

Neural Network ◽

Skin Cancer ◽

Convolutional Neural Network ◽

Well Being ◽

Training Data ◽

Data Sets ◽

Data Set ◽

Sequential Model ◽

Pigmented Lesions ◽

Testing Data

The project “Disease Prediction Model” focuses on predicting the type of skin cancer. It deals with constructing a Convolutional Neural Network(CNN) sequential model in order to find the type of a skin cancer which takes a huge troll on mankind well-being. Since development of programmed methods increases the accuracy at high scale for identifying the type of skin cancer, we use Convolutional Neural Network, CNN algorithm in order to build our model . For this we make use of a sequential model. The data set that we have considered for this project is collected from NCBI, which is well known as HAM10000 dataset, it consists of massive amounts of information regarding several dermatoscopic images of most trivial pigmented lesions of skin which are collected from different sufferers. Once the dataset is collected, cleaned, it is split into training and testing data sets. We used CNN to build our model and using the training data we trained the model , later using the testing data we tested the model. Once the model is implemented over the testing data, plots are made in order to analyze the relation between the echos and loss function. It is also used to analyse accuracy and echos for both training and testing data.

Download Full-text

AI Testing: Ensuring a Good Data Split Between Data Sets (Training and Test) using K-means Clustering and Decision Tree Analysis

International Journal on Soft Computing ◽

10.5121/ijsc.2021.12101 ◽

2021 ◽

Vol 12 (1) ◽

pp. 1-11

Author(s):

Kishore Sugali ◽

Chris Sprunger ◽

Venkata N Inukollu

Keyword(s):

Decision Tree ◽

Software Testing ◽

Training Data ◽

Data Sets ◽

Full Data ◽

Data Set ◽

Full Dataset ◽

Development Methodology ◽

Testing Data ◽

Long Time

Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.

Download Full-text

Population Bias in Polygenic Risk Prediction Models for Coronary Artery Disease

Circulation Genomic and Precision Medicine ◽

10.1161/circgen.120.002932 ◽

2020 ◽

Vol 13 (6) ◽

Cited By ~ 1

Author(s):

Damian Gola ◽

Jeanette Erdmann ◽

Kristi Läll ◽

Reedik Mägi ◽

Bertram Müller-Myhsok ◽

...

Keyword(s):

Coronary Artery Disease ◽

Coronary Artery ◽

Risk Prediction ◽

Prediction Models ◽

Data Sets ◽

Risk Prediction Models ◽

Polygenic Risk ◽

Testing Data ◽

Artery Disease ◽

European Populations

Background: Individual risk prediction based on genome-wide polygenic risk scores (PRSs) using millions of genetic variants has attracted much attention. It is under debate whether PRS models can be applied—without loss of precision—to populations of similar ethnic but different geographic background than the one the scores were trained on. Here, we examine how PRS trained in population-specific but European data sets perform in other European subpopulations in distinguishing between coronary artery disease patients and healthy individuals. Methods: We use data from UK and Estonian biobanks (UKB, EB) as well as case-control data from the German population (DE) to develop and evaluate PRS in the same and different populations. Results: PRSs have the highest performance in their corresponding population testing data sets, whereas their performance significantly drops if applied to testing data sets from different European populations. Models trained on DE data revealed area under the curves in independent testing sets in DE: 0.6752, EB: 0.6156, and UKB: 0.5989; trained on EB and tested on EB: 0.6565, DE: 0.5407, and UKB: 0.6043; trained on UKB and tested on UKB: 0.6133, DE: 0.5143, and EB: 0.6049. Conclusions: This result has a direct impact on the clinical usability of PRS for risk prediction models using PRS: a population effect must be kept in mind when applying risk estimation models, which are based on additional genetic information even for individuals from different European populations of the same ethnicity.

Download Full-text

Development of an Automated Visibility Analysis Framework for Pavement Markings Based on the Deep Learning Approach

Remote Sensing ◽

10.3390/rs12223837 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3837

Author(s):

Kyubyung Kang ◽

Donghui Chen ◽

Cheng Peng ◽

Dan Koo ◽

Taewook Kang ◽

...

Keyword(s):

Critical Role ◽

Training Data ◽

Data Sets ◽

Learning Technology ◽

Analysis Framework ◽

Visibility Analysis ◽

Pavement Markings ◽

Detection Model ◽

Maintenance Work ◽

The Right

Pavement markings play a critical role in reducing crashes and improving safety on public roads. As road pavements age, maintenance work for safety purposes becomes critical. However, inspecting all pavement markings at the right time is very challenging due to the lack of available human resources. This study was conducted to develop an automated condition analysis framework for pavement markings using machine learning technology. The proposed framework consists of three modules: a data processing module, a pavement marking detection module, and a visibility analysis module. The framework was validated through a case study of pavement markings training data sets in the U.S. It was found that the detection model of the framework was very precise, which means most of the identified pavement markings were correctly classified. In addition, in the proposed framework, visibility was confirmed as an important factor of driver safety and maintenance, and visibility standards for pavement markings were defined.

Download Full-text

Penerapan Artificial Neural Network (ANN) untuk Memprediksi Perubahan Derajat Miopia pada Manusia

Jurnal Matematika ◽

10.24843/jmat.2020.v10.i01.p123 ◽

2020 ◽

Vol 10 (1) ◽

pp. 53

Author(s):

Ni Kadek Emik Sapitri ◽

I Putu Eka N. Kencana ◽

Luh Putu Ida Harini

Keyword(s):

Training Data ◽

Fuzzy Membership ◽

Fuzzy Membership Function ◽

Individual Identity ◽

Vision Disorder ◽

Testing Data ◽

Visual Disorders ◽

Artificial Neural Network Ann ◽

The Right ◽

And Behavior

Myopia is a vision disorder that causes the sufferers unable to see distant objects. The degree of myopia in humans can changes, both increasing and decreasing. The increasing of myopia degree is proportional to the potential of other visual disorders, such as cataracts, retinal detachment, and glaucoma. Therefore, the increasing of myopia degree needs to be watched out. Several previous studies only considered the time factor in predicting the changes of myopia degree. In fact, the changes of myopia degree also influenced by some factors that related to individual identity and behavior. This study aims to predict the changes of myopia degree in humans based on some factors that causes myopia.. This study uses data that has been scaled with the fuzzy membership function to be processed with ANN for predicting the changes of myopia degree. By ANN 6-2-3 architecture that uses 80 training data, 20 testing data, and 1 predictive data, the prediction result of the changes of myopia degree in the right eye is 1.1 dioptri, in the left eye is 1.2 dioptri and the accumulated of both is 2.3 dioptri with accuration values 87.79%, 78.47%, and 83.21%.

Download Full-text

South German Credit Data Classification Using Random Forest Algorithm to Predict Bank Credit Receipts

JISA(Jurnal Informatika dan Sains) ◽

10.31326/jisa.v3i2.837 ◽

2020 ◽

Vol 3 (2) ◽

Author(s):

Yoga Religia ◽

Gatot Tri Pranoto ◽

Egar Dika Santosa

Keyword(s):

Data Mining ◽

Random Forest ◽

Large Data ◽

Classification Algorithm ◽

Large Data Sets ◽

Training Data ◽

Data Sets ◽

Testing Data ◽

Credit Data

Normally, most of the bank's wealth is obtained from providing credit loans so that a marketing bank must be able to reduce the risk of non-performing credit loans. The risk of providing loans can be minimized by studying patterns from existing lending data. One technique that can be used to solve this problem is to use data mining techniques. Data mining makes it possible to find hidden information from large data sets by way of classification. The Random Forest (RF) algorithm is a classification algorithm that can be used to deal with data imbalancing problems. The purpose of this study is to discuss the use of the RF algorithm for classification of South German Credit data. This research is needed because currently there is no previous research that applies the RF algorithm to classify South German Credit data specifically. Based on the tests that have been done, the optimal performance of the classification algorithm RF on South German Credit data is the comparison of training data of 85% and testing data of 15% with an accuracy of 78.33%.

Download Full-text

A Novel Model-Based Approach in Classification Using Extension Distance

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2009 ◽

2014 ◽

Vol 644-650 ◽

pp. 2009-2012 ◽

Cited By ~ 1

Author(s):

Hai Tao Zhang ◽

Bin Jun Wang

Keyword(s):

Experimental Results ◽

Training Data ◽

Data Sets ◽

Benchmark Data ◽

Model Based ◽

Original Dataset ◽

Testing Data ◽

Data Points ◽

Low Efficiency ◽

Novel Model

In order to solve the low efficiency problem of KNN or K-Means like algorithms in classification, a novel extension distance of interval is proposed to measure the similarity between testing data and the class domain. The method constructs representatives for data points in shorter time than traditional methods which replace original dataset to serve as the basis of classification. Virtually, the construction of the model containing representatives makes classification faster. Experimental results from two benchmark data sets, verify the effectiveness and applicability of the proposed work. The model based method using extension distance can effectively build data models to represent whole training data, and thus a high cost of classifying new instances problem is solved.

Download Full-text

Applying WPD and SVD to Classification of EM Wave Induced by Partial Discharge in Power Transformer

Journal of Electrical Engineering ◽

10.2478/jee-2013-0032 ◽

2013 ◽

Vol 64 (4) ◽

pp. 222-229 ◽

Cited By ~ 1

Author(s):

Xu Zhao ◽

Yong-Hong Cheng ◽

Yong-Peng Meng ◽

Michael G. Danikas

Keyword(s):

Power Transformer ◽

Wavelet Packet ◽

Partial Discharge ◽

Training Data ◽

Decomposition Level ◽

Frequency Information ◽

Testing Data ◽

The Right ◽

Wave Induced

Partial discharge (PD) current is an impulse signal at nanosecond level, which can generate electromagnetic (EM) wave containing broadband frequency information. The frequency band of EM signal is from MHz up to GHz. Due to different PD patterns, impulse currents with different shapes induce different EM waves containing different frequency information. Therefore, using the features extracted from frequency domain of EM signals, the classification of PD patterns can be effectively got. It is good to use wavelet or wavelet packet decomposition to select features. However, if the decomposition level is too shallow to find enough effective features, it cannot group the EM signals to the right pattern. On the contrary, although it is easier to find features to distinguish the PD pattern if the decomposition level is deep, there will be a lot of redundancy variables and it is hard to select features among so many variables. In this paper, a method is presented, which selected features in the whole decomposition tree instead of selecting among the leaf node of the tree, because more potential features can be found in the whole tree. With the present method, it is possible not only to get enough features, but also to eliminate the redundancy variables effectively. In order to validate the method, large EM signals from four PD patterns in a power transformer are acquired as the training data and testing data for feature selection and classification, and three common classification methods are introduced to classify the PD patterns using the features selected by the method. Most of the classification results are satisfactory indicating that the proposed method is effective.

Download Full-text

complexFuzzy: A novel clustering method for selecting training instances of cross-project defect prediction

Computer Science ◽

10.7494/csci.2021.22.1.3743 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Muhammed Maruf Ozturk

Keyword(s):

Area Under The Curve ◽

Prediction Performance ◽

Training Data ◽

Defect Prediction ◽

Data Sets ◽

Clustering Method ◽

Testing Data ◽

Proper Training ◽

Comparison Algorithms ◽

Cross Project

Over the last decade, researchers have investigated to what extent cross-project defect prediction (CPDP) shows advantages over traditional defect prediction settings. These works do not take training and testing data of defect prediction from the same project. Instead, dissimilar projects are employed. Selecting proper training data plays an important role in terms of the success of CPDP. In this study, a novel clustering method named complexFuzzy is presented for selecting training data of CPDP. The method is developed by determining membership values with the help of some metrics which can be considered as indicators of complexity. First, CPDP combinations are created on 29 different data sets. Subsequently, complexFuzzy is evaluated by considering cluster centers of data sets and comparing some performance measures including area under the curve (AUC) and F-measure. The method is superior to other five comparison algorithms in terms of the distance of cluster centers and prediction performance.

Download Full-text