Using Local Alignments for Relation Recognition

Journal of Artificial Intelligence Research ◽

10.1613/jair.2964 ◽

2010 ◽

Vol 38 ◽

pp. 1-48 ◽

Cited By ~ 2

Author(s):

S. Katrenko ◽

P. W. Adriaans ◽

M. Van Someren

Keyword(s):

Sequence Similarity ◽

General Relation ◽

Similarity Measures ◽

Semantic Relatedness ◽

Structural Similarity ◽

Learning Task ◽

Semantic Knowledge ◽

Local Alignment ◽

Data Sets ◽

Definition Of

This paper discusses the problem of marrying structural similarity with semantic relatedness for Information Extraction from text. Aiming at accurate recognition of relations, we introduce local alignment kernels and explore various possibilities of using them for this task. We give a definition of a local alignment (LA) kernel based on the Smith-Waterman score as a sequence similarity measure and proceed with a range of possibilities for computing similarity between elements of sequences. We show how distributional similarity measures obtained from unlabeled data can be incorporated into the learning task as semantic knowledge. Our experiments suggest that the LA kernel yields promising results on various biomedical corpora outperforming two baselines by a large margin. Additional series of experiments have been conducted on the data sets of seven general relation types, where the performance of the LA kernel is comparable to the current state-of-the-art results.

Download Full-text

Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

PeerJ ◽

10.7717/peerj.3486 ◽

2017 ◽

Vol 5 ◽

pp. e3486 ◽

Cited By ~ 3

Author(s):

Won Cheol Yim ◽

John C. Cushman

Keyword(s):

Sequence Analysis ◽

Sequence Similarity ◽

Query Sequence ◽

Divide And Conquer ◽

Local Alignment ◽

Data Sets ◽

Processing Unit ◽

Central Processing ◽

Analysis Tools ◽

Similarity Searches

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.

Download Full-text

Evaluation of Spatio-Temporal Microsimulation Systems

Data Science and Simulation in Transportation Research - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-4920-0.ch008 ◽

2014 ◽

pp. 141-166

Author(s):

Christine Kopp ◽

Bruno Kochan ◽

Michael May ◽

Luca Pappalardo ◽

Salvatore Rinzivillo ◽

...

Keyword(s):

Real World ◽

Similarity Measures ◽

Movement Behavior ◽

Data Sets ◽

Evaluation Standard ◽

Mobility Data ◽

Movement Data ◽

Realistic Representation ◽

Spatio Temporal ◽

Definition Of

The increasing expressiveness of spatio-temporal microsimulation systems makes them attractive for a wide range of real world applications. However, the broad field of applications puts new challenges to the quality of microsimulation systems. They are no longer expected to reflect a few selected mobility characteristics but to be a realistic representation of the real world. In consequence, the validation of spatio-temporal microsimulations has to be deepened and to be especially moved towards a holistic view on movement validation. One advantage hereby is the easier availability of mobility data sets at present, which enables the validation of many different aspects of movement behavior. However, these data sets bring their own challenges as the data may cover only a part of the observation space, differ in its temporal resolution, or not be representative in all aspects. In addition, the definition of appropriate similarity measures, which capture the various mobility characteristics, is challenging. The goal of this chapter is to pave the way for a novel, better, and more detailed evaluation standard for spatio-temporal microsimulation systems. The chapter collects and structure’s various aspects that have to be considered for the validation and comparison of movement data. In addition, it assembles the state-of-the-art of existing validation techniques. It concludes with examples of using big data sources for the extraction and validation of movement characteristics outlining the research challenges that have yet to be conquered.

Download Full-text

A Similarity Searching System for Biological Phenotype Images Using Deep Convolutional Encoder-decoder Architecture

Current Bioinformatics ◽

10.2174/1574893614666190204150109 ◽

2019 ◽

Vol 14 (7) ◽

pp. 628-639 ◽

Cited By ~ 10

Author(s):

Bizhi Wu ◽

Hangxiao Zhang ◽

Limei Lin ◽

Huiyuan Wang ◽

Yubang Gao ◽

...

Keyword(s):

Neural Network ◽

Retrieval System ◽

Sequence Similarity ◽

Local Alignment ◽

Similarity Searching ◽

Loss Of Function ◽

Biological Images ◽

The Neural Network ◽

Convolutional Autoencoder ◽

Biological Phenotype

Background: The BLAST (Basic Local Alignment Search Tool) algorithm has been widely used for sequence similarity searching. Analogously, the public phenotype images must be efficiently retrieved using biological images as queries and identify the phenotype with high similarity. Due to the accumulation of genotype-phenotype-mapping data, a system of searching for similar phenotypes is not available due to the bottleneck of image processing. Objective: In this study, we focus on the identification of similar query phenotypic images by searching the biological phenotype database, including information about loss-of-function and gain-of-function. Methods: We propose a deep convolutional autoencoder architecture to segment the biological phenotypic images and develop a phenotype retrieval system to enable a better understanding of genotype–phenotype correlation. Results: This study shows how deep convolutional autoencoder architecture can be trained on images from biological phenotypes to achieve state-of-the-art performance in a phenotypic images retrieval system. Conclusion: Taken together, the phenotype analysis system can provide further information on the correlation between genotype and phenotype. Additionally, it is obvious that the neural network model of image segmentation and the phenotype retrieval system is equally suitable for any species, which has enough phenotype images to train the neural network.

Download Full-text

Structure Unveils Relationships between RNA Virus Polymerases

Viruses ◽

10.3390/v13020313 ◽

2021 ◽

Vol 13 (2) ◽

pp. 313

Author(s):

Heli A. M. Mönttinen ◽

Janne J. Ravantti ◽

Minna M. Poranen

Keyword(s):

Phylogenetic Tree ◽

Rna Viruses ◽

Rna Virus ◽

Sequence Similarity ◽

Protein Structures ◽

Structural Similarity ◽

Functional Differentiation ◽

Comparison Method ◽

Homologous Structure ◽

Biological Entities

RNA viruses are the fastest evolving known biological entities. Consequently, the sequence similarity between homologous viral proteins disappears quickly, limiting the usability of traditional sequence-based phylogenetic methods in the reconstruction of relationships and evolutionary history among RNA viruses. Protein structures, however, typically evolve more slowly than sequences, and structural similarity can still be evident, when no sequence similarity can be detected. Here, we used an automated structural comparison method, homologous structure finder, for comprehensive comparisons of viral RNA-dependent RNA polymerases (RdRps). We identified a common structural core of 231 residues for all the structurally characterized viral RdRps, covering segmented and non-segmented negative-sense, positive-sense, and double-stranded RNA viruses infecting both prokaryotic and eukaryotic hosts. The grouping and branching of the viral RdRps in the structure-based phylogenetic tree follow their functional differentiation. The RdRps using protein primer, RNA primer, or self-priming mechanisms have evolved independently of each other, and the RdRps cluster into two large branches based on the used transcription mechanism. The structure-based distance tree presented here follows the recently established RdRp-based RNA virus classification at genus, subfamily, family, order, class and subphylum ranks. However, the topology of our phylogenetic tree suggests an alternative phylum level organization.

Download Full-text

Discriminating between JCPyV and BKPyV in Urinary Virome Data Sets

Viruses ◽

10.3390/v13061041 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1041

Author(s):

Rita Mormando ◽

Alan J. Wolfe ◽

Catherine Putonti

Keyword(s):

Jc Virus ◽

Sequence Similarity ◽

Bk Virus ◽

Data Sets ◽

Metagenomic Sequencing ◽

Significant Sequence Similarity ◽

Sequence Identity ◽

Shotgun Metagenomic Sequencing ◽

Urinary Microbiome ◽

Six Genes

Polyomaviruses are abundant in the human body. The polyomaviruses JC virus (JCPyV) and BK virus (BKPyV) are common viruses in the human urinary tract. Prior studies have estimated that JCPyV infects between 20 and 80% of adults and that BKPyV infects between 65 and 90% of individuals by age 10. However, these two viruses encode for the same six genes and share 75% nucleotide sequence identity across their genomes. While prior urinary virome studies have repeatedly reported the presence of JCPyV, we were interested in seeing how JCPyV prevalence compares to BKPyV. We retrieved all publicly available shotgun metagenomic sequencing reads from urinary microbiome and virome studies (n = 165). While one third of the data sets produced hits to JCPyV, upon further investigation were we able to determine that the majority of these were in fact BKPyV. This distinction was made by specifically mining for JCPyV and BKPyV and considering uniform coverage across the genome. This approach provides confidence in taxon calls, even between closely related viruses with significant sequence similarity.

Download Full-text

Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins

Scientific Reports ◽

10.1038/s41598-021-81063-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dimitri Boeckaerts ◽

Michiel Stock ◽

Bjorn Criel ◽

Hans Gerstmans ◽

Bernard De Baets ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Receptor Binding ◽

Bacterial Infections ◽

Sequence Data ◽

Sequence Similarity ◽

Area Under The Curve ◽

Local Alignment ◽

Search Tool ◽

Different Levels

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.

Download Full-text

Quantification of Ligand PET Studies using a Reference Region with a Displaceable Fraction: Application to Occupancy Studies with [11C]-DASB as an Example

Journal of Cerebral Blood Flow & Metabolism ◽

10.1038/jcbfm.2011.108 ◽

2011 ◽

Vol 32 (1) ◽

pp. 70-80 ◽

Cited By ~ 21

Author(s):

Federico E Turkheimer ◽

Sudhakar Selvaraj ◽

Rainer Hinz ◽

Venkatesha Murthy ◽

Zubin Bhagwagar ◽

...

Keyword(s):

Specific Binding ◽

Normal Subjects ◽

Volume Of Distribution ◽

Binding Potential ◽

Accurate Estimation ◽

Data Sets ◽

Reference Region ◽

Reference Tissue ◽

Positron Emission ◽

Definition Of

This paper aims to build novel methodology for the use of a reference region with specific binding for the quantification of brain studies with radioligands and positron emission tomography (PET). In particular: (1) we introduce a definition of binding potential BPD = DVR–1 where DVR is the volume of distribution relative to a reference tissue that contains ligand in specifically bound form, (2) we validate a numerical methodology, rank-shaping regularization of exponential spectral analysis (RS-ESA), for the calculation of BPD that can cope with a reference region with specific bound ligand, (3) we demonstrate the use of RS-ESA for the accurate estimation of drug occupancies with the use of correction factors to account for the specific binding in the reference. [11C]-DASB with cerebellum as a reference was chosen as an example to validate the methodology. Two data sets were used; four normal subjects scanned after infusion of citalopram or placebo and further six test—retest data sets. In the drug occupancy study, the use of RS-ESA with cerebellar input plus corrections produced estimates of occupancy very close the ones obtained with plasma input. Test-retest results demonstrated a tight linear relationship between BPD calculated either with plasma or with a reference input and high reproducibility.

Download Full-text

Structural and functional analysis of the Na+/H+ exchanger

Biochemical Journal ◽

10.1042/bj20061062 ◽

2007 ◽

Vol 401 (3) ◽

pp. 623-633 ◽

Cited By ~ 165

Author(s):

Emily R. Slepkov ◽

Jan K. Rainey ◽

Brian D. Sykes ◽

Larry Fliegel

Keyword(s):

Intracellular Ph ◽

Sequence Similarity ◽

Structural Data ◽

Structural Similarity ◽

Integral Membrane Protein ◽

Volume Control ◽

Amino Acid Residues ◽

Physiological Processes ◽

High Resolution Structure ◽

Extracellular Sodium

The mammalian NHE (Na+/H+ exchanger) is a ubiquitously expressed integral membrane protein that regulates intracellular pH by removing a proton in exchange for an extracellular sodium ion. Of the nine known isoforms of the mammalian NHEs, the first isoform discovered (NHE1) is the most thoroughly characterized. NHE1 is involved in numerous physiological processes in mammals, including regulation of intracellular pH, cell-volume control, cytoskeletal organization, heart disease and cancer. NHE comprises two domains: an N-terminal membrane domain that functions to transport ions, and a C-terminal cytoplasmic regulatory domain that regulates the activity and mediates cytoskeletal interactions. Although the exact mechanism of transport by NHE1 remains elusive, recent studies have identified amino acid residues that are important for NHE function. In addition, progress has been made regarding the elucidation of the structure of NHEs. Specifically, the structure of a single TM (transmembrane) segment from NHE1 has been solved, and the high-resolution structure of the bacterial Na+/H+ antiporter NhaA has recently been elucidated. In this review we discuss what is known about both functional and structural aspects of NHE1. We relate the known structural data for NHE1 to the NhaA structure, where TM IV of NHE1 shows surprising structural similarity with TM IV of NhaA, despite little primary sequence similarity. Further experiments that will be required to fully understand the mechanism of transport and regulation of the NHE1 protein are discussed.

Download Full-text

Indicators of AEI Applied to the Delaware Estuary

The Scientific World JOURNAL ◽

10.1100/tsw.2002.346 ◽

2002 ◽

Vol 2 ◽

pp. 169-189 ◽

Cited By ~ 3

Author(s):

Lawrence W. Barnthouse ◽

Douglas G. Heimbuch ◽

Vaughn C. Anthony ◽

Ray W. Hilborn ◽

Ransom A. Myers

Keyword(s):

Meta Analysis ◽

Juvenile Fish ◽

Fish Populations ◽

Weight Of Evidence ◽

Data Sets ◽

Delaware Estuary ◽

Fish Stocks ◽

Trends Analysis ◽

Definition Of

We evaluated the impacts of entrainment and impingement at the Salem Generating Station on fish populations and communities in the Delaware Estuary. In the absence of an agreed-upon regulatory definition of “adverse environmental impact” (AEI), we developed three independent benchmarks of AEI based on observed or predicted changes that could threaten the sustainability of a population or the integrity of a community.Our benchmarks of AEI included: (1) disruption of the balanced indigenous community of fish in the vicinity of Salem (the “BIC” analysis); (2) a continued downward trend in the abundance of one or more susceptible fish species (the “Trends” analysis); and (3) occurrence of entrainment/impingement mortality sufficient, in combination with fishing mortality, to jeopardize the future sustainability of one or more populations (the “Stock Jeopardy” analysis).The BIC analysis utilized nearly 30 years of species presence/absence data collected in the immediate vicinity of Salem. The Trends analysis examined three independent data sets that document trends in the abundance of juvenile fish throughout the estuary over the past 20 years. The Stock Jeopardy analysis used two different assessment models to quantify potential long-term impacts of entrainment and impingement on susceptible fish populations. For one of these models, the compensatory capacities of the modeled species were quantified through meta-analysis of spawner-recruit data available for several hundred fish stocks.All three analyses indicated that the fish populations and communities of the Delaware Estuary are healthy and show no evidence of an adverse impact due to Salem. Although the specific models and analyses used at Salem are not applicable to every facility, we believe that a weight of evidence approach that evaluates multiple benchmarks of AEI using both retrospective and predictive methods is the best approach for assessing entrainment and impingement impacts at existing facilities.

Download Full-text

The theory of conditional invariance. I

Proceedings of the Royal Society of London Series A - Mathematical and Physical Sciences ◽

10.1098/rspa.1968.0124 ◽

1968 ◽

Vol 305 (1482) ◽

pp. 405-427 ◽

Cited By ~ 1

Keyword(s):

Bound States ◽

General Relation ◽

Variable Parameter ◽

Geometric Invariance ◽

Hermitian Conjugate ◽

New Type ◽

Definition Of ◽

Conditional Invariance ◽

Conditional Constant ◽

The Continuum

In order to extend the use of group theoretical arguments to the problem of accidental degeneracy in quantum mechanics, a new type of constant of the motion, known as a conditional constant of the motion, is introduced. Such a quantity, instead of commuting with the Hamiltonian H for the system, satisfies the more general relation H A = A † H , where A † denotes the hermitian conjugate (adjoint) of the conditional constant of the motion A . This expression reduces, if A is hermitian, to the usual definition of a constant of the motion. Otherwise it defines a new type of invariance, and it is this which will be referred to as conditional invariance. A discussion of the difficulties arising from the lack of hermiticity of A , which is of course essential to its definition, is given. In particular it is shown, under fairly general conditions, that the process of introducing a variable parameter in the Hamiltonian enabling it to have simultaneous eigenfunctions with A , gives rise to an eigenvalue equation in this parameter with respect to which A may be chosen to be hermitian. Conditional invariance is contrasted with both dynamical and geometric invariance. It is found to be sometimes replaceable by either of the latter forms of invariance and for such, explicit conditions are given. Some applications of conditional invariance are discussed. These include a study of the crossing of potential energy curves, a new model of symmetry breaking, a possible means of calculating the exact number of bound states for certain potentials and conditions for the existence of bound states near to the continuum.

Download Full-text