A Methodology to Determine the Subset of Heuristics for Hyperheuristics through Metalearning for Solving Graph Coloring and Capacitated Vehicle Routing Problems

Complexity ◽

10.1155/2021/6660572 ◽

2021 ◽

Vol 2021 ◽

pp. 1-22

Author(s):

Lucero Ortiz-Aguilar ◽

Martín Carpio ◽

Alfonso Rojas-Domínguez ◽

Manuel Ornelas-Rodriguez ◽

H. J. Puga-Soberanes ◽

...

Keyword(s):

Vehicle Routing ◽

Graph Coloring ◽

Cross Validation ◽

State Of The Art ◽

Statistical Tests ◽

Statistical Comparison ◽

Offline Learning ◽

Comparison Of The Results ◽

Fold Cross Validation ◽

Capacitated Vehicle

In this work, we focus on the problem of selecting low-level heuristics in a hyperheuristic approach with offline learning, for the solution of instances of different problem domains. The objective is to improve the performance of the offline hyperheuristic approach, identifying equivalence classes in a set of instances of different problems and selecting the best performing heuristics in each of them. A methodology is proposed as the first step of a set of instances of all problems, and the generic characteristics of each instance and the performance of the heuristics in each one of them are considered to define the vectors of characteristics and make a grouping of classes. Metalearning with statistical tests is used to select the heuristics for each class. Finally, we used the Naive Bayes to test the set instances with k-fold cross-validation, and we compared all results statistically with the best-known values. In this research, the methodology was tested by applying it to the problems of capacitated vehicle routing (CVRP) and graph coloring (GCP). The experimental results show that the proposed methodology can improve the performance of the offline hyperheuristic approach, correctly identifying the classes of instances and applying the appropriate heuristics in each case. This is based on the statistical comparison of the results obtained with those of the state of the art of each instance.

Download Full-text

A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs

BMC Bioinformatics ◽

10.1186/s12859-020-03906-7 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Yubin Xiao ◽

Zheng Xiao ◽

Xiang Feng ◽

Zhiping Chen ◽

Linai Kuang ◽

...

Keyword(s):

Computational Model ◽

Cross Validation ◽

State Of The Art ◽

Prediction Methods ◽

Good Prediction ◽

Average Case ◽

Comparison Results ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

Download Full-text

Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms

Applied Sciences ◽

10.3390/app10093291 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3291

Author(s):

Jesús F. Pérez-Gómez ◽

Juana Canul-Reich ◽

José Hernández-Torruco ◽

Betania Hernández-Ocaña

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Bacterial Vaginosis ◽

Cross Validation ◽

Performance Comparison ◽

Support Vector ◽

Ongoing Research ◽

Selection For ◽

Comparison Of The Results ◽

Fold Cross Validation

Requiring only a few relevant characteristics from patients when diagnosing bacterial vaginosis is highly useful for physicians as it makes it less time consuming to collect these data. This would result in having a dataset of patients that can be more accurately diagnosed using only a subset of informative or relevant features in contrast to using the entire set of features. As such, this is a feature selection (FS) problem. In this work, decision tree and Relief algorithms were used as feature selectors. Experiments were conducted on a real dataset for bacterial vaginosis with 396 instances and 252 features/attributes. The dataset was obtained from universities located in Baltimore and Atlanta. The FS algorithms utilized feature rankings, from which the top fifteen features formed a new dataset that was used as input for both support vector machine (SVM) and logistic regression (LR) algorithms for classification. For performance evaluation, averages of 30 runs of 10-fold cross-validation were reported, along with balanced accuracy, sensitivity, and specificity as performance measures. A performance comparison of the results was made between using the total number of features against using the top fifteen. These results found similar attributes from our rankings compared to those reported in the literature. This study is part of ongoing research that is investigating a range of feature selection and classification methods.

Download Full-text

Global observation-based climatology of precipitation occurrence and peak intensity

10.5194/egusphere-egu2020-7837 ◽

2020 ◽

Author(s):

Hylke Beck ◽

Seth Westra ◽

Eric Wood

Keyword(s):

Land Surface ◽

Regression Models ◽

Cross Validation ◽

Climate Models ◽

Daily Precipitation ◽

State Of The Art ◽

Coefficient Of Determination ◽

Peak Intensity ◽

Uncertainty Estimates ◽

Fold Cross Validation

We introduce a unique set of global observation-based climatologies of daily precipitation (P) occurrence (related to the lower tail of the P distribution) and peak intensity (related to the upper tail of the P distribution). The climatologies were produced using Random Forest (RF) regression models trained with an unprecedented collection of daily P observations from 93,138 stations worldwide. Five-fold cross-validation was used to evaluate the generalizability of the approach and to quantify uncertainty globally. The RF models were found to provide highly satisfactory performance, yielding cross-validation coefficient of determination (R2) values from 0.74 for the 15-year return-period daily P intensity to 0.86 for the >0.5 mm d-1 daily P occurrence. The performance of the RF models was consistently superior to that of state-of-the-art reanalysis (ERA5) and satellite (IMERG) products. The highest P intensities over land were found along the western equatorial coast of Africa, in India, and along coastal areas of Southeast Asia. Using a 0.5 mm d-1 threshold, P was estimated to occur 23.2 % of days on average over the global land surface (excluding Antarctica). The climatologies including uncertainty estimates will be released as the Precipitation DISTribution (PDIST) dataset via www.gloh2o.org/pdist. We expect the dataset to be useful for numerous purposes, such as the evaluation of climate models, the bias correction of gridded P datasets, and the design of hydraulic structures in poorly gauged regions.

Download Full-text

A Novel Computational Model for Predicting Potential LncRNA-Disease Associations based on Both Direct and Indirect Features of LncRNA-Disease Pairs

10.21203/rs.2.18937/v3 ◽

2020 ◽

Author(s):

Yubin Xiao ◽

Zheng Xiao ◽

Xiang Feng ◽

Zhiping Chen ◽

Linai Kuang ◽

...

Keyword(s):

Computational Model ◽

Cross Validation ◽

State Of The Art ◽

Prediction Methods ◽

Good Prediction ◽

Average Case ◽

Comparison Results ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background: Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well.Results: In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (5-fold CV), 10-Fold Cross Validation (10-fold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in 5-fold CV, 10-fold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA.Conclusion: The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

Download Full-text

A Machine Learning Method for Detecting the Trace of Seam Carving

Elektronika ir Elektrotechnika ◽

10.5755/j02.eie.29050 ◽

2021 ◽

Author(s):

Zehra Karapinar Senturk ◽

Devrim Akgun

Keyword(s):

Local Binary Pattern ◽

Cross Validation ◽

Detection Method ◽

State Of The Art ◽

Evaluation Process ◽

Support Vector ◽

Blind Detection ◽

Seam Carving ◽

Image Retargeting ◽

Fold Cross Validation

Image retargeting is a manipulation approach for resizing the images while aiming to keep the image distortion at a low level. Detecting image retargeting is of importance in image forensics or sometimes of importance in checking the originality. The aim of this paper is to introduce a new blind detection method for identifying retargeted images based on seam carving. For this purpose, a new method based on stripes at various numbers, Local Binary Pattern (LBP) transform, and energy map is introduced. The sub-images were obtained from square root of the energy map of LBP transform in the form of stripes for the feature extraction and these were evaluated in terms of several statistical features. The features extracted both from the natural and the seam carved images were used to train a Support Vector Machine (SVM) as a binary classifier. Experimental results were obtained using four-fold cross validation to improve the validity of the results during the evaluation process. According to the experiments, the proposed method produces improved accuracies when compared with the state-of-the-art solutions for the image retargeting detection based on seam carving.

Download Full-text

Hybrid Cuckoo Search for the Capacitated Vehicle Routing Problem

Symmetry ◽

10.3390/sym12122088 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2088

Author(s):

Mansour Alssager ◽

Zulaiha Ali Othman ◽

Masri Ayob ◽

Rosmayati Mohemad ◽

Herman Yuliansyah

Keyword(s):

Vehicle Routing ◽

Vehicle Routing Problem ◽

State Of The Art ◽

Cuckoo Search ◽

Disruptive Selection ◽

Selection Strategy ◽

Routing Problem ◽

Capacitated Vehicle Routing Problem ◽

Neighborhood Structures ◽

Capacitated Vehicle

Having the best solution for Vehicle Routing Problem (VRP) is still in demand. Beside, Cuckoo Search (CS) is a popular metaheuristic based on the reproductive strategy of the Cuckoo species and has been successfully applied in various optimizations, including Capacitated Vehicle Routing Problem (CVRP). Although CS and hybrid CS have been proposed for CVRP, the performance of CS is far from the state-of-art. Therefore, this study proposes a hybrid CS with Simulated Annealing (SA) algorithm for the CVRP, consisting of three improvements—the investigation of 12 neighborhood structures, three selections strategy and hybrid it with SA. The experiment was conducted using 16 instances of the Augerat benchmark dataset. The results show that 6 out of 12 neighborhood structures were the best and the disruptive selection strategy is the best strategy. The experiments’ results showed that the proposed method could find optimal and near-optimal solutions compared with state-of-the-art algorithms.

Download Full-text

A Bi-LSTM Based Ensemble Algorithm for Prediction of Protein Secondary Structure

Applied Sciences ◽

10.3390/app9173538 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3538 ◽

Cited By ~ 3

Author(s):

Hailong Hu ◽

Zhong Li ◽

Arne Elofsson ◽

Shangxin Xie

Keyword(s):

Secondary Structure ◽

Cross Validation ◽

State Of The Art ◽

Protein Secondary Structure ◽

Ensemble Methods ◽

Ensemble Model ◽

Training Process ◽

Independent Test ◽

Test Sets ◽

Fold Cross Validation

The prediction of protein secondary structure continues to be an active area of research in bioinformatics. In this paper, a Bi-LSTM based ensemble model is developed for the prediction of protein secondary structure. The ensemble model with dual loss function consists of five sub-models, which are finally joined by a Bi-LSTM layer. In contrast to existing ensemble methods, which generally train each sub-model and then join them as a whole, this ensemble model and sub-models can be trained simultaneously and the performance of each model can be observed and compared during the training process. Three independent test sets (e.g., data1199, 513 protein Cuff & Barton set (CB513) and 203 proteins from Critical Appraisals Skills Programme (CASP203)) are employed to test the method. On average, the ensemble model achieved 84.3% in Q 3 accuracy and 81.9% in segment overlap measure ( SOV ) score by using 10-fold cross validation. There is an improvement of up to 1% over some state-of-the-art prediction methods of protein secondary structure.

Download Full-text

ADVIAN: Alzheimer's Disease VGG-Inspired Attention Network Based on Convolutional Block Attention Module and Multiple Way Data Augmentation

Frontiers in Aging Neuroscience ◽

10.3389/fnagi.2021.687456 ◽

2021 ◽

Vol 13 ◽

Author(s):

Shui-Hua Wang ◽

Qinghua Zhou ◽

Ming Yang ◽

Yu-Dong Zhang

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cross Validation ◽

Data Augmentation ◽

State Of The Art ◽

Attention Network ◽

Backbone Network ◽

Novel Method ◽

Precision And Accuracy ◽

Fold Cross Validation

Aim: Alzheimer's disease is a neurodegenerative disease that causes 60–70% of all cases of dementia. This study is to provide a novel method that can identify AD more accurately.Methods: We first propose a VGG-inspired network (VIN) as the backbone network and investigate the use of attention mechanisms. We proposed an Alzheimer's Disease VGG-Inspired Attention Network (ADVIAN), where we integrate convolutional block attention modules on a VIN backbone. Also, 18-way data augmentation is proposed to avoid overfitting. Ten runs of 10-fold cross-validation are carried out to report the unbiased performance.Results: The sensitivity and specificity reach 97.65 ± 1.36 and 97.86 ± 1.55, respectively. Its precision and accuracy are 97.87 ± 1.53 and 97.76 ± 1.13, respectively. The F1 score, MCC, and FMI are obtained as 97.75 ± 1.13, 95.53 ± 2.27, and 97.76 ± 1.13, respectively. The AUC is 0.9852.Conclusion: The proposed ADVIAN gives better results than 11 state-of-the-art methods. Besides, experimental results demonstrate the effectiveness of 18-way data augmentation.

Download Full-text

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine

Scientific Reports ◽

10.1038/s41598-020-80430-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Abdul Wahab ◽

Hilal Tayara ◽

Zhenyu Xuan ◽

Kil To Chong

Keyword(s):

Deep Learning ◽

Language Processing ◽

Dna Sequences ◽

Area Under Curve ◽

Cross Validation ◽

Learning Algorithm ◽

State Of The Art ◽

Deep Learning Algorithm ◽

Fold Cross Validation ◽

Genome Dataset

AbstractN4-methylcytosine is a biochemical alteration of DNA that affects the genetic operations without modifying the DNA nucleotides such as gene expression, genomic imprinting, chromosome stability, and the development of the cell. In the proposed work, a computational model, 4mCNLP-Deep, used the word embedding approach as a vector formulation by exploiting deep learning based CNN algorithm to predict 4mC and non-4mC sites on the C.elegans genome dataset. Diversity of ranges employed for the experimental such as corpus k-mer and k-fold cross-validation to obtain the prevailing capabilities. The 4mCNLP-Deep outperform from the state-of-the-art predictor by achieving the results in five evaluation metrics by following; Accuracy (ACC) as 0.9354, Mathew’s correlation coefficient (MCC) as 0.8608, Specificity (Sp) as 0.89.96, Sensitivity (Sn) as 0.9563, and Area under curve (AUC) as 0.9731 by using 3-mer corpus word2vec and 3-fold cross-validation and attained the increment of 1.1%, 0.6%, 0.58%, 0.77%, and 4.89%, respectively. At last, we developed the online webserver http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/, for the experimental researchers to get the results easily.

Download Full-text

A Novel Approach Based on Point Cut Set to Predict Associations of Diseases and LncRNAs

Current Bioinformatics ◽

10.2174/1574893613666181026122045 ◽

2019 ◽

Vol 14 (4) ◽

pp. 333-343 ◽

Cited By ~ 3

Author(s):

Linai Kuang ◽

Haochen Zhao ◽

Lei Wang ◽

Zhanwei Xuan ◽

Tingrui Pei

Keyword(s):

Cross Validation ◽

State Of The Art ◽

Interaction Network ◽

Research Field ◽

Computational Method ◽

Difference Matrix ◽

Art Methods ◽

Disease Associations ◽

Cut Set ◽

Fold Cross Validation

Background: In recent years, more evidence have progressively indicated that Long non-coding RNAs (lncRNAs) play vital roles in wide-ranging human diseases, which can serve as potential biomarkers and drug targets. Comparing with vast lncRNAs being found, the relationships between lncRNAs and diseases remain largely unknown. Objective: The prediction of novel and potential associations between lncRNAs and diseases would contribute to dissect the complex mechanisms of disease pathogenesis. associations while known disease-lncRNA associations are required only. Method: In this paper, a new computational method based on Point Cut Set is proposed to predict LncRNA-Disease Associations (PCSLDA) based on known lncRNA-disease associations. Compared with the existing state-of-the-art methods, the major novelty of PCSLDA lies in the incorporation of distance difference matrix and point cut set to set the distance correlation coefficient of nodes in the lncRNA-disease interaction network. Hence, PCSLDA can be applied to forecast potential lncRNAdisease associations while known disease-lncRNA associations are required only. Results: Simulation results show that PCSLDA can significantly outperform previous state-of-the-art methods with reliable AUC of 0.8902 in the leave-one-out cross-validation and AUCs of 0.7634 and 0.8317 in 5-fold cross-validation and 10-fold cross-validation respectively. And additionally, 70% of top 10 predicted cancer-lncRNA associations can be confirmed. Conclusion: It is anticipated that our proposed model can be a great addition to the biomedical research field.

Download Full-text