A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes

Mapping Intimacies ◽

10.1101/022616 ◽

2015 ◽

Author(s):

Nicole E. Wheeler ◽

Lars Barquist ◽

Robert A. Kingsley ◽

Paul P. Gardner

Keyword(s):

Protein Function ◽

Large Scale ◽

Functional Divergence ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Amino Acid Sequence Level ◽

Single Nucleotide ◽

Genetic Changes ◽

Sequencing Technologies ◽

Bacterial Genomics

AbstractMotivationNext generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide polymorphisms and indels on protein function, particularly in bacterial genomics.ResultsWe present a hidden Markov model based approach we call delta-bitscore (DBS) for identifying orthologous proteins that have diverged at the amino acid sequence level in a way that is likely to impact biological function. We benchmark this approach with several widely used datasets and apply it to a proof-of-concept study of orthologous proteomes in an investigation of host adaptation in Salmonella enterica. We highlight the value of the method in identifying functional divergence of genes, and suggest that this tool may be a better approach than the commonly used dN/dS metric for identifying functionally significant genetic changes occurring in recently diverged organisms.AvailabilityA program implementing DBS for pairwise genome comparisons is freely available at: https://github.com/UCanCompBio/[email protected], [email protected] informationSupplementary data are available at BioRxiv online.

Download Full-text

Quantifying functional impact of non-coding variants with multi-task Bayesian neural network

Bioinformatics ◽

10.1093/bioinformatics/btz767 ◽

2019 ◽

Vol 36 (5) ◽

pp. 1397-1404

Author(s):

Chencheng Xu ◽

Qiao Liu ◽

Jianyu Zhou ◽

Minzhu Xie ◽

Jianxing Feng ◽

...

Keyword(s):

Quantitative Trait Locus ◽

Single Nucleotide Polymorphisms ◽

Quantitative Trait ◽

Large Scale ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Functional Impact ◽

Coding Regions ◽

Trait Locus

Abstract Motivation Advances in high-throughput genotyping and sequencing technologies during recent years have revealed essential roles of non-coding regions in gene regulation. Genome-wide association studies (GWAS) suggested that a large proportion of risk variants are located in non-coding regions and remain unexplained by current expression quantitative trait loci catalogs. Interpreting the causal effects of these genetic modifications is crucial but difficult owing to our limited knowledge of how regulatory elements function. Although several computational methods have been designed to prioritize regulatory variants that substantially impact human phenotypes, few of them achieve consistently high performance even when large-scale multi-omic data are integrated. Results We propose a novel multi-task framework based on Bayesian deep neural networks, MtBNN, to quantify the deleterious impact of single nucleotide polymorphisms in non-coding genomic regions. With the high-efficiency provided by the multi-task Bayesian framework to integrate information from different sources, MtBNN is capable of extracting features from genomic sequences of large-scale chromatin-profiling data, such as chromatin accessibility and transcript factor binding affinities, and calculating the distribution of the probability that a non-coding variant disrupts regulatory activities. A series of comprehensive experiments show that MtBNN quantifies the functional impact of cis-regulatory variations with high accuracy, including expression quantitative trait locus, DNase I sensitivity quantitative trait locus and functional genetic variants located within ATAC-peaks that affect the accessibility of the corresponding peak and achieves significantly better performance than the existing methods. Moreover, MtBNN has applications in the discovery of potentially causal disease-associated single-nucleotide polymorphisms (SNPs), thus helping fine-map the GWAS SNPs. Availability and implementation Code can be downloaded from https://github.com/Zoesgithub/MtBNN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Dementia key gene identification with multi-layered SNP-gene-disease network

Bioinformatics ◽

10.1093/bioinformatics/btaa814 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i831-i839

Author(s):

Dong-gi Lee ◽

Myungjun Kim ◽

Sang Joon Son ◽

Chang Hyung Hong ◽

Hyunjung Shin

Keyword(s):

Candidate Genes ◽

Learning Algorithm ◽

Search Space ◽

Supplementary Information ◽

Gene Identification ◽

Nucleotide Polymorphisms ◽

Disease Network ◽

Single Nucleotide ◽

Key Genes ◽

Significant Attention

Abstract Motivation Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and effort. To alleviate these problems, research on utilizing computational biology to decrease the search space of candidate genes is actively conducted. In this study, we propose a framework in which diseases, genes and single-nucleotide polymorphisms are represented by a layered network, and key genes are predicted by a machine learning algorithm. The algorithm utilizes a network-based semi-supervised learning model that can be applied to layered data structures. Results The proposed method was applied to a dataset extracted from public databases related to diseases and genes with data collected from 186 patients. A portion of key genes obtained using the proposed method was verified in silico through PubMed literature, and the remaining genes were left as possible candidate genes. Availability and implementation The code for the framework will be available at http://www.alphaminers.net/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A NOTE ON PHASING LONG GENOMIC REGIONS USING LOCAL HAPLOTYPE PREDICTIONS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006002272 ◽

2006 ◽

Vol 04 (03) ◽

pp. 639-647 ◽

Cited By ~ 6

Author(s):

ELEAZAR ESKIN ◽

RODED SHARAN ◽

ERAN HALPERIN

Keyword(s):

Large Scale ◽

Computational Cost ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Novel Approach ◽

Maximum Likelihood Criterion ◽

The Common ◽

Genomic Regions ◽

High Computational Cost ◽

Combining Information

The common approaches for haplotype inference from genotype data are targeted toward phasing short genomic regions. Longer regions are often tackled in a heuristic manner, due to the high computational cost. Here, we describe a novel approach for phasing genotypes over long regions, which is based on combining information from local predictions on short, overlapping regions. The phasing is done in a way, which maximizes a natural maximum likelihood criterion. Among other things, this criterion takes into account the physical length between neighboring single nucleotide polymorphisms. The approach is very efficient and is applied to several large scale datasets and is shown to be successful in two recent benchmarking studies (Zaitlen et al., in press; Marchini et al., in preparation). Our method is publicly available via a webserver at .

Download Full-text

DNA Melting Analysis for Detection of Single Nucleotide Polymorphisms

Clinical Chemistry ◽

10.1093/clinchem/47.4.635 ◽

2001 ◽

Vol 47 (4) ◽

pp. 635-644 ◽

Cited By ~ 68

Author(s):

Robert H Lipsky ◽

Chiara M Mazzanti ◽

Joseph G Rudolph ◽

Ke Xu ◽

Gopal Vyas ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Large Scale ◽

Denaturing Gradient Gel Electrophoresis ◽

Dna Melting ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Efficient System ◽

Gradient Gel Electrophoresis ◽

Melting Analysis ◽

Scale Detection

Abstract Background: Several methods for detection of single nucleotide polymorphisms (SNPs; e.g., denaturing gradient gel electrophoresis and denaturing HPLC) are indirectly based on the principle of differential melting of heteroduplex DNA. We present a method for detecting SNPs that is directly based on this principle. Methods: We used a double-stranded DNA-specific fluorescent dye, SYBR Green I (SYBR) in an efficient system (PE 7700 Sequence Detector) in which DNA melting was controlled and monitored in a 96-well plate format. We measured the decrease in fluorescence intensity that accompanied DNA duplex denaturation, evaluating the effects of fragment length, dye concentration, DNA concentration, and sequence context using four naturally occurring polymorphisms (three SNPs and a single-base deletion/insertion). Results: DNA melting analysis (DM) was used successfully for variant detection, and we also discovered two previously unknown SNPs by this approach. Concentrations of DNA amplicons were readily monitored by SYBR fluorescence, and DNA amplicon concentrations were highly reproducible, with a CV of 2.6%. We readily detected differences in the melting temperature between homoduplex and heteroduplex fragments 15–167 bp in length and differing by only a single nucleotide substitution. Conclusions: The efficiency and sensitivity of DMA make it highly suitable for the large-scale detection of sequence variants.

Download Full-text

simuG: a general-purpose genome simulator

Bioinformatics ◽

10.1093/bioinformatics/btz424 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4442-4444 ◽

Cited By ~ 10

Author(s):

Jia-Xing Yue ◽

Gianni Liti

Keyword(s):

Copy Number Variants ◽

General Purpose ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Bioinformatics Analyses ◽

Full Spectrum ◽

Single Nucleotide ◽

Genomic Variants ◽

Wide Range ◽

Simulation Based

Abstract Summary Simulated genomes with pre-defined and random genomic variants can be very useful for benchmarking genomic and bioinformatics analyses. Here we introduce simuG, a lightweight tool for simulating the full-spectrum of genomic variants (single nucleotide polymorphisms, Insertions/Deletions, copy number variants, inversions and translocations) for any organisms (including human). The simplicity and versatility of simuG make it a unique general-purpose genome simulator for a wide-range of simulation-based applications. Availability and implementation Code in Perl along with user manual and testing data is available at https://github.com/yjx1217/simuG. This software is free for use under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TNF-αPolymorphisms in Juvenile Idiopathic Arthritis: Which Potential Clinical Implications?

International Journal of Rheumatology ◽

10.1155/2012/756291 ◽

2012 ◽

Vol 2012 ◽

pp. 1-16 ◽

Cited By ~ 13

Author(s):

A. Scardapane ◽

L. Breda ◽

M. Lucantoni ◽

F. Chiarelli

Keyword(s):

Juvenile Idiopathic Arthritis ◽

Large Scale ◽

Nucleotide Polymorphisms ◽

Necrosis Factor Alpha ◽

Single Nucleotide ◽

Adult Rheumatoid Arthritis ◽

Functional Consequences ◽

Factor Alpha ◽

Necrosis Factor ◽

Important Cytokine

Whether tumor necrosis factor alpha (TNF-α) gene polymorphisms (SNPs) influence disease susceptibility and treatment of patients with juvenile idiopathic arthritis (JIA) is presently uncertain. TNF-αis one of the most important cytokine involved in JIA pathogenesis. Several single nucleotide polymorphisms (SNPs) have been identified within the region of the TNF-αgene but only a very small minority have proven functional consequences and have been associated with susceptibility to JIA. An association between some TNF-αSNPs and adult rheumatoid arthritis (RA) susceptibility, severity and clinical response to anti-TNF-αtreatment has been reported. The most frenquetly studied TNF-αSNP is located at −308 position, where a substitution of the G allele with the rare A allele has been found. The presence of the allele −308A is associated to JIA and to a poor prognosis. Besides, the −308G genotype has been associated with a better response to anti-TNF-αtherapy in JIA patients, confirming adult data. Psoriatic and oligoarticular arthritis are significantly associated to the −238 SNP only in some works. Studies considering other SNPs are conflicting and inconclusive. Large scale studies are required to define the contribution of TNF-αgene products to disease pathogenesis and anti-TNF-αtherapeutic efficacy in JIA.

Download Full-text

A regression framework to uncover pleiotropy in large-scale electronic health record data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz084 ◽

2019 ◽

Vol 26 (10) ◽

pp. 1083-1090 ◽

Cited By ~ 2

Author(s):

Ruowang Li ◽

Rui Duan ◽

Rachel L Kember ◽

Daniel J Rader ◽

Scott M Damrauer ◽

...

Keyword(s):

Large Scale ◽

Pleiotropic Effects ◽

Nucleotide Polymorphisms ◽

Reduced Rank Regression ◽

Electronic Health Record Data ◽

Single Nucleotide ◽

Rank Regression ◽

Reduced Rank ◽

Multiple Phenotypes ◽

Regression Framework

Abstract Objective Pleiotropy, where 1 genetic locus affects multiple phenotypes, can offer significant insights in understanding the complex genotype–phenotype relationship. Although individual genotype–phenotype associations have been thoroughly explored, seemingly unrelated phenotypes can be connected genetically through common pleiotropic loci or genes. However, current analyses of pleiotropy have been challenged by both methodologic limitations and a lack of available suitable data sources. Materials and Methods In this study, we propose to utilize a new regression framework, reduced rank regression, to simultaneously analyze multiple phenotypes and genotypes to detect pleiotropic effects. We used a large-scale biobank linked electronic health record data from the Penn Medicine BioBank to select 5 cardiovascular diseases (hypertension, cardiac dysrhythmias, ischemic heart disease, congestive heart failure, and heart valve disorders) and 5 mental disorders (mood disorders; anxiety, phobic and dissociative disorders; alcohol-related disorders; neurological disorders; and delirium dementia) to validate our framework. Results Compared with existing methods, reduced rank regression showed a higher power to distinguish known associated single-nucleotide polymorphisms from random single-nucleotide polymorphisms. In addition, genome-wide gene-based investigation of pleiotropy showed that reduced rank regression was able to identify candidate genetic variants with novel pleiotropic effects compared to existing methods. Conclusion The proposed regression framework offers a new approach to account for the phenotype and genotype correlations when identifying pleiotropic effects. By jointly modeling multiple phenotypes and genotypes together, the method has the potential to distinguish confounding from causal genotype and phenotype associations.

Download Full-text

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text

Association between a Variant in MicroRNA-646 and the Susceptibility to Hepatocellular Carcinoma in a Large-Scale Population

The Scientific World JOURNAL ◽

10.1155/2014/312704 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Rui Wang ◽

Jun Zhang ◽

Weiru Jiang ◽

Yanyun Ma ◽

Wenshuai Li ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Single Nucleotide Polymorphisms ◽

Protective Effect ◽

Large Scale ◽

Cancer Development ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Scale Population

Background. Single-nucleotide polymorphisms in microRNAs play important roles in oncogenesis and cancer development.Objective. We aim to explore whether miR-646 rs6513497 is associated with the risk of hepatocellular carcinoma.Methods. Total 997 HCC patients and 993 cancer-free controls were enrolled in this study. Genotyping was performed using MassARRAY method.Results. Compared with the T allele of rs6513497, the G allele was associated with a significantly decreased risk of HCC (OR = 0.788, 95% CI = 0.631–0.985,P= 0.037); moreover, a more protective effect of the G allele was shown in males (OR = 0.695, 95% CI = 0.539–0.897,P= 0.005 in HCC and OR = 0.739, 95% CI = 0.562–0.972,P= 0.030 in HBV-related HCC), basically in a dominant manner (HCC: OR = 0.681, 95% CI = 0.162–0.896,P= 0.006; HBV-related HCC: OR = 0.715, 95% CI = 0.532–0.962,P= 0.027).Conclusions. Our findings support the view that the miR-646 SNP rs6513497 may contribute to the susceptibility of HCC.

Download Full-text

Retrospective study assessing the association of single nucleotide polymorphisms in VEGFR3 and on-target toxicity in patients with advanced renal-cell carcinoma (RCC) treated with sunitinib.

Journal of Clinical Oncology ◽

10.1200/jco.2014.32.4_suppl.537 ◽

2014 ◽

Vol 32 (4_suppl) ◽

pp. 537-537

Author(s):

Jf. Rodriguez-Moreno ◽

Emilio Esteban ◽

Luis Javier Leandro-García ◽

Daniel E. Castellano ◽

Aranzazu Gonzalez del Alba ◽

...

Keyword(s):

Kinase Inhibitors ◽

Kinase Inhibitor ◽

Nucleotide Polymorphisms ◽

Secondary Effects ◽

Single Nucleotide ◽

Genetic Changes ◽

First Line Therapy ◽

Study Results ◽

Antiangiogenic Drugs ◽

Grade Ii

537 Background: Sunitinib is a tyrosine kinase inhibitor approved as first line therapy of RCC. Some Adverse Events (AEs) of sunitinib are due to its inhibition of VEGFR and could be considered as “on-target” toxicity. They are a class-effects and have been observed even with the most selective antiangiogenic drugs. We aimed to correlate such secondary effects with germline SNPs in VEGFR3. Methods: In order to define “on target ” toxicity we considered any AEs ≥ Grade II (CTCAE 4.0) recorded in more than 10% of the patients included the pivotal studies of axitinib and tivozanib (the most selective tyrosin kinase inhibitors targeting VEGF(R). We assessed associations between polymorphisms in VEGFR3 and on-target toxicity for patients with advanced RCC treated with sunitinib prospectively included in the SUTRENT Study. Results: Polymorphisms in VEGFR3 are associated with less on-target toxicity. Probably these genetic changes confer a reduced susceptibility to the action of sunitinib. This could explain, at least partially, the worse outcome showed in this population. Conclusions: Polymorphisms in VEGFR3 are associated with less on-target toxicity. Probably these genetic changes confer a reduced susceptibility to the action of sunitinib. This could explain, at least partially, the worse outcome showed in this population.

Download Full-text