Evolving Fisher Kernels for Biological Sequence Classification

K.-J. Won; C. Saunders; A. Prügel-Bennett

doi:10.1162/evco_a_00065

Evolving Fisher Kernels for Biological Sequence Classification

Evolutionary Computation ◽

10.1162/evco_a_00065 ◽

2013 ◽

Vol 21 (1) ◽

pp. 83-105 ◽

Cited By ~ 2

Author(s):

K.-J. Won ◽

C. Saunders ◽

A. Prügel-Bennett

Keyword(s):

Sequence Similarity ◽

Generative Models ◽

Complex Model ◽

Generative Model ◽

Support Vector ◽

Homologous Sequence ◽

Sequence Information ◽

Biological Sequence ◽

Domain Specific ◽

Fisher Kernel

Fisher kernels have been successfully applied to many problems in bioinformatics. However, their success depends on the quality of the generative model upon which they are built. For Fisher kernel techniques to be used on novel problems, a mechanism for creating accurate generative models is required. A novel framework is presented for automatically creating domain-specific generative models that can be used to produce Fisher kernels for support vector machines (SVMs) and other kernel methods. The framework enables the capture of prior knowledge and addresses the issue of domain-specific kernels, both of which are current areas that are lacking in many kernel-based methods. To obtain the generative model, genetic algorithms are used to evolve the structure of hidden Markov models (HMMs). A Fisher kernel is subsequently created from the HMM, and used in conjunction with an SVM, to improve the discriminative power. This paper investigates the effectiveness of the proposed method, named GA-SVM. We show that its performance is comparable if not better than other state of the art methods in classifying secretory protein sequences of malaria. More interestingly, it showed better results than the sequence-similarity-based approach, without the need for additional homologous sequence information in protein enzyme family classification. The experiments clearly demonstrate that the GA-SVM is a novel way to find features with good performance from biological sequences, that does not require extensive tuning of a complex model.

Download Full-text

Link Prediction through Deep Generative Model

10.1101/247577 ◽

2018 ◽

Author(s):

Xu-Wen Wang ◽

Yize Chen ◽

Yang-Yu Liu

Keyword(s):

Complex Networks ◽

Real World ◽

Link Prediction ◽

Prediction Method ◽

Generative Models ◽

Generative Model ◽

Superior Performance ◽

Prediction Problem ◽

Structural Patterns ◽

Domain Specific

AbstractInferring missing links or predicting future ones based on the currently observed network is known as link prediction, which has tremendous real-world applications in biomedicine1–3, e-commerce4, social media5 and criminal intelligence6. Numerous methods have been proposed to solve the link prediction problem7–9. Yet, many of these existing methods are designed for undirected networks only. Moreover, most methods are based on domain-specific heuristics10, and hence their performances differ greatly for networks from different domains. Here we developed a new link prediction method based on deep generative models11 in machine learning. This method does not rely on any domain-specific heuristic and works for general undirected or directed complex networks. Our key idea is to represent the adjacency matrix of a network as an image and then learn hierarchical feature representations of the image by training a deep generative model. Those features correspond to structural patterns in the network at different scales, from small subgraphs to mesoscopic communities12. Conceptually, taking into account structural patterns at different scales all together should outperform any domain-specific heuristics that typically focus on structural patterns at a particular scale. Indeed, when applied to various real-world networks from different domains13–17, our method shows overall superior performance against existing methods. Moreover, it can be easily parallelized by splitting a large network into several small subnetworks and then perform link prediction for each subnetwork in parallel. Our results imply that deep learning techniques can be effectively applied to complex networks and solve the classical link prediction problem with robust and superior performance.SummaryWe propose a new link prediction method based on deep generative models.

Download Full-text

IsoDetect: Detection of splice isoforms from third generation long reads based on short feature sequences

Current Bioinformatics ◽

10.2174/1574893615666200316101205 ◽

2020 ◽

Vol 15 ◽

Author(s):

Hongdong Li ◽

Wenjing Zhang ◽

Yuwen Luo ◽

Jianxin Wang

Keyword(s):

Sequence Similarity ◽

Detection Methods ◽

Sequence Information ◽

Third Generation ◽

Sequencing Data ◽

Splice Isoforms ◽

Third Generation Sequencing ◽

Long Reads ◽

Feature Sequence ◽

Generation Sequencing

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.

Download Full-text

Equivariant Adversarial Network for Image-to-image Translation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3458280 ◽

2021 ◽

Vol 17 (2s) ◽

pp. 1-14

Author(s):

Masoumeh Zareapoor ◽

Jie Yang

Keyword(s):

State Of The Art ◽

Generative Models ◽

Generative Model ◽

Target Domain ◽

Adversarial Network ◽

Proposed Model ◽

Image Translation ◽

Great Performance ◽

Representative Model ◽

The Ideal

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.

Download Full-text

Multispectral Face Recognition Using Transfer Learning with Adaptation of Domain Specific Units

Sensors ◽

10.3390/s21134520 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4520

Author(s):

Luis Lopes Chambino ◽

José Silvestre Silva ◽

Alexandre Bernardino

Keyword(s):

Neural Network ◽

Transfer Learning ◽

Facial Recognition ◽

Spectral Band ◽

Support Vector ◽

Multispectral Images ◽

Forgery Detection ◽

K Nearest Neighbor ◽

Deep Convolutional Neural Networks ◽

Domain Specific

Facial recognition is a method of identifying or authenticating the identity of people through their faces. Nowadays, facial recognition systems that use multispectral images achieve better results than those that use only visible spectral band images. In this work, a novel architecture for facial recognition that uses multiple deep convolutional neural networks and multispectral images is proposed. A domain-specific transfer-learning methodology applied to a deep neural network pre-trained in RGB images is shown to generalize well to the multispectral domain. We also propose a skin detector module for forgery detection. Several experiments were planned to assess the performance of our methods. First, we evaluate the performance of the forgery detection module using face masks and coverings of different materials. A second study was carried out with the objective of tuning the parameters of our domain-specific transfer-learning methodology, in particular which layers of the pre-trained network should be retrained to obtain good adaptation to multispectral images. A third study was conducted to evaluate the performance of support vector machines (SVM) and k-nearest neighbor classifiers using the embeddings obtained from the trained neural network. Finally, we compare the proposed method with other state-of-the-art approaches. The experimental results show performance improvements in the Tufts and CASIA NIR-VIS 2.0 multispectral databases, with a rank-1 score of 99.7% and 99.8%, respectively.

Download Full-text

A One-shot Learning Approach to Image Classification using Genetic Programming

10.26686/wgtn.13150934.v1 ◽

2020 ◽

Author(s):

Harith Al-Sahaf ◽

Mengjie Zhang ◽

M Johnston

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Genetic Programming ◽

Image Classification ◽

Local Binary Patterns ◽

Support Vector ◽

Learning Approach ◽

Data Sets ◽

Domain Specific ◽

International Publishing

In machine learning, it is common to require a large number of instances to train a model for classification. In many cases, it is hard or expensive to acquire a large number of instances. In this paper, we propose a novel genetic programming (GP) based method to the problem of automatic image classification via adopting a one-shot learning approach. The proposed method relies on the combination of GP and Local Binary Patterns (LBP) techniques to detect a predefined number of informative regions that aim at maximising the between-class scatter and minimising the within-class scatter. Moreover, the proposed method uses only two instances of each class to evolve a classifier. To test the effectiveness of the proposed method, four different texture data sets are used and the performance is compared against two other GP-based methods namely Conventional GP and Two-tier GP. The experiments revealed that the proposed method outperforms these two methods on all the data sets. Moreover, a better performance has been achieved by Naïve Bayes, Support Vector Machine, and Decision Trees (J48) methods when extracted features by the proposed method have been used compared to the use of domain-specific and Two-tier GP extracted features. © Springer International Publishing 2013.

Download Full-text

Intrusion Detection with Support Vector Machines and Generative Models

Lecture Notes in Computer Science - Information Security ◽

10.1007/3-540-45811-5_3 ◽

2002 ◽

pp. 32-47 ◽

Cited By ~ 1

Author(s):

John S. Baras ◽

Maben Rabi

Keyword(s):

Support Vector Machines ◽

Intrusion Detection ◽

Generative Models ◽

Support Vector ◽

Vector Machines

Download Full-text

Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0067 ◽

2019 ◽

Vol 2019 (4) ◽

pp. 232-249 ◽

Cited By ~ 1

Author(s):

Benjamin Hilprecht ◽

Martin Härterich ◽

Daniel Bernau

Keyword(s):

Data Privacy ◽

Information Leakage ◽

Generative Models ◽

Generative Model ◽

Training Data ◽

Generative Adversarial Networks ◽

Data Sets ◽

Success Rates ◽

Model Quality ◽

Type Formalization

Abstract We present two information leakage attacks that outperform previous work on membership inference against generative models. The first attack allows membership inference without assumptions on the type of the generative model. Contrary to previous evaluation metrics for generative models, like Kernel Density Estimation, it only considers samples of the model which are close to training data records. The second attack specifically targets Variational Autoencoders, achieving high membership inference accuracy. Furthermore, previous work mostly considers membership inference adversaries who perform single record membership inference. We argue for considering regulatory actors who perform set membership inference to identify the use of specific datasets for training. The attacks are evaluated on two generative model architectures, Generative Adversarial Networks (GANs) and Variational Autoen-coders (VAEs), trained on standard image datasets. Our results show that the two attacks yield success rates superior to previous work on most data sets while at the same time having only very mild assumptions. We envision the two attacks in combination with the membership inference attack type formalization as especially useful. For example, to enforce data privacy standards and automatically assessing model quality in machine learning as a service setups. In practice, our work motivates the use of GANs since they prove less vulnerable against information leakage attacks while producing detailed samples.

Download Full-text

Towards Consistent Variational Auto-Encoding (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7207 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13869-13870

Author(s):

Yijing Liu ◽

Shuyu Lin ◽

Ronald Clark

Keyword(s):

Original Data ◽

Generative Models ◽

Generative Model ◽

Approximate Inference ◽

Inference Model ◽

Approach To Learning ◽

Successful Approach ◽

Data Point ◽

Reconstructed Data

Variational autoencoders (VAEs) have been a successful approach to learning meaningful representations of data in an unsupervised manner. However, suboptimal representations are often learned because the approximate inference model fails to match the true posterior of the generative model, i.e. an inconsistency exists between the learnt inference and generative models. In this paper, we introduce a novel consistency loss that directly requires the encoding of the reconstructed data point to match the encoding of the original data, leading to better representations. Through experiments on MNIST and Fashion MNIST, we demonstrate the existence of the inconsistency in VAE learning and that our method can effectively reduce such inconsistency.

Download Full-text

Multivariate information fusion for identifying antifungal peptides with Hilbert-Schmidt Independence Criterion

Current Bioinformatics ◽

10.2174/1574893616666210727161003 ◽

2021 ◽

Vol 16 ◽

Author(s):

Haohao Zhou ◽

Hao Wang ◽

Yijie Ding ◽

Jijun Tang

Keyword(s):

Fungal Infections ◽

Practical Significance ◽

Gaussian Kernel ◽

Support Vector ◽

Sequence Information ◽

Kernel Support Vector Machine ◽

Svm Model ◽

Antifungal Peptides ◽

Independence Criterion ◽

Better Than

Background: Antifungal peptides (AFP) have been found to be effective against many fungal infections. Objective: However, it is difficult to identify AFP. Therefore, it is great practical significance to identify AFP via machine learning methods (with sequence information). Method: In this study, a Multi-Kernel Support Vector Machine (MKSVM) with Hilbert-Schmidt Independence Criterion (HSIC) is proposed. Proteins are encoded with five types of features (188-bit, AAC, ASDC, CKSAAP, DPC), and then construct kernels using Gaussian kernel function. HSIC are used to combine kernels and multi-kernel SVM model is built. Results: Our model performed well on three AFPs datasets and the performance is better than or comparable to other state-of-art predictive models. Conclusion: Our method will be a useful tool for identifying antifungal peptides.

Download Full-text

Improving PSI-BLAST’s Fold Recognition Performance through Combining Consensus Sequences and Support Vector Machine

Interdisciplinary Research and Applications in Bioinformatics, Computational Biology, and Environmental Sciences - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-60960-064-8.ch005 ◽

2011 ◽

pp. 51-59

Author(s):

Ren-Xiang Yan ◽

Jing Liu ◽

Yi-Min Tao

Keyword(s):

Support Vector Machine ◽

Sequence Alignment ◽

Recognition Performance ◽

Consensus Sequence ◽

Early Time ◽

Fold Recognition ◽

Support Vector ◽

Sequence Information ◽

Consensus Sequences ◽

Profile Alignment

Profile-profile alignment may be the most sensitive and useful computational resource for identifying remote homologies and recognizing protein folds. However, profile-profile alignment is usually much more complex and slower than sequence-sequence or profile-sequence alignment. The profile or PSSM (position-specific scoring matrix) can be used to represent the mutational variability at each sequence position of a protein by using a vector of amino acid substitution frequencies and it is a much richer encoding of a protein sequence. Consensus sequence, which can be considered as a simplified profile, was used to improve sequence alignment accuracy in the early time. Recently, several studies were carried out to improve PSI-BLAST’s fold recognition performance by using consensus sequence information. There are several ways to compute a consensus sequence. Based on these considerations, we propose a method that combines the information of different types of consensus sequences with the assistance of support vector machine learning in this chapter. Benchmark results suggest that our method can further improve PSI-BLAST’s fold recognition performance.

Download Full-text