LSTrAP-Cloud: A User-Friendly Cloud Computing Pipeline to Infer Coexpression Networks

Qiao Wen Tan; William Goh; Marek Mutwil

doi:10.3390/genes11040428

LSTrAP-Cloud: A User-Friendly Cloud Computing Pipeline to Infer Coexpression Networks

Genes ◽

10.3390/genes11040428 ◽

2020 ◽

Vol 11 (4) ◽

pp. 428 ◽

Cited By ~ 3

Author(s):

Qiao Wen Tan ◽

William Goh ◽

Marek Mutwil

Keyword(s):

Rna Sequencing ◽

Gene Function ◽

Large Scale ◽

Single Gene ◽

Function Prediction ◽

Sequencing Data ◽

Gene Function Prediction ◽

European Nucleotide Archive ◽

User Friendly ◽

Coexpression Networks

As genomes become more and more available, gene function prediction presents itself as one of the major hurdles in our quest to extract meaningful information on the biological processes genes participate in. In order to facilitate gene function prediction, we show how our user-friendly pipeline, the Large-Scale Transcriptomic Analysis Pipeline in Cloud (LSTrAP-Cloud), can be useful in helping biologists make a shortlist of genes involved in a biological process that they might be interested in, by using a single gene of interest as bait. The LSTrAP-Cloud is based on Google Colaboratory, and provides user-friendly tools that process quality-control RNA sequencing data streamed from the European Nucleotide Archive. The LSTRAP-Cloud outputs a gene coexpression network that can be used to identify functionally related genes for any organism with a sequenced genome and publicly available RNA sequencing data. Here, we used the biosynthesis pathway of Nicotiana tabacum as a case study to demonstrate how enzymes, transporters, and transcription factors involved in the synthesis, transport, and regulation of nicotine can be identified using our pipeline.

Download Full-text

LSTrAP-Cloud: A User-friendly Cloud Computing Pipeline to Infer Co-functional and Regulatory Networks

10.1101/2020.03.11.986794 ◽

2020 ◽

Cited By ~ 1

Author(s):

Qiao Wen Tan ◽

William Goh ◽

Marek Mutwil

Keyword(s):

Rna Sequencing ◽

Gene Function ◽

Regulatory Networks ◽

Large Scale ◽

Function Prediction ◽

Sequencing Data ◽

Gene Function Prediction ◽

Analysis Pipeline ◽

User Friendly

AbstractAs genomes become more and more available, gene function prediction presents itself as one of the major hurdles in our quest to extract meaningful information on the biological processes genes participate in. In order to facilitate gene function prediction, we show how our user-friendly pipeline, Large-Scale Transcriptomic Analysis Pipeline in Cloud (LSTrAP-Cloud), can be useful in helping biologists make a shortlist of genes that they might be interested in. LSTrAP-Cloud is based on Google Colaboratory and provides user-friendly tools that process and quality-control RNA sequencing data streamed from the European Sequencing Archive. LSTRAP-Cloud outputs a gene co-expression network that can be used to identify functionally related genes for any organism with a sequenced genome and publicly available RNA sequencing data. Here, we used the biosynthesis pathway of Nicotiana tabacum as a case study to demonstrate how enzymes, transporters and transcription factors involved in the synthesis, transport and regulation of nicotine can be identified using our pipeline.

Download Full-text

Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana

10.1101/181396 ◽

2017 ◽

Author(s):

Bjoern Oest Hansen ◽

Etienne H. Meyer ◽

Camilla Ferrari ◽

Neha Vaid ◽

Sara Movahedi ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

Gene Function ◽

Complex I ◽

Prediction Method ◽

Function Prediction ◽

Mitochondrial Complex ◽

Gene Function Prediction ◽

Inference Methods ◽

User Friendly

Despite increasing availability of sequenced genomes, accurate characterization of gene functions is needed to close the genotype-phenotype gap. Recent advances in gene function prediction rely on ensemble approaches that integrate the results from multiple inference methods to produce superior predictions. Yet, these developments remain largely unexplored in plants. We present Neighbor Counting Ensemble, a gene function prediction method which integrates eleven gene co-function networks for Arabidopsis thaliana, and produces more accurate gene function predictions for a larger fraction of genes with unknown function. We used these predictions to identify genes involved in mitochondrial complex I formation, and for five of them we confirmed the predictions experimentally. The ensemble predictions are provided as a user-friendly online database, EnsembleNet, available at http://www.gene2function.de/ensemblenet.html.

Download Full-text

Faculty Opinions recommendation of The art of gene function prediction.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1056759.508687 ◽

2006 ◽

Author(s):

Martin Noble

Keyword(s):

Gene Function ◽

Function Prediction ◽

Gene Function Prediction

Download Full-text

Faculty Opinions recommendation of Network-Based Gene Function Prediction in Mouse and Other Model Vertebrates Using MouseNet Server.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727562216.793535246 ◽

2017 ◽

Author(s):

John Hancock

Keyword(s):

Gene Function ◽

Function Prediction ◽

Gene Function Prediction

Download Full-text

Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression

PLoS ONE ◽

10.1371/journal.pone.0134668 ◽

2015 ◽

Vol 10 (8) ◽

pp. e0134668 ◽

Cited By ~ 12

Author(s):

Sonja Lehtinen ◽

Jon Lees ◽

Jürg Bähler ◽

John Shawe-Taylor ◽

Christine Orengo

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Gene Function ◽

Partial Least Squares Regression ◽

Function Prediction ◽

Least Squares Regression ◽

Gene Function Prediction ◽

Functional Association ◽

Kernel Partial Least Squares

Download Full-text

Using similarity learning to improve network-based gene function prediction

2012 IEEE International Conference on Bioinformatics and Biomedicine ◽

10.1109/bibm.2012.6392663 ◽

2012 ◽

Cited By ~ 1

Author(s):

Ngo Phuong Nhung ◽

Tu Minti Phuong

Keyword(s):

Gene Function ◽

Function Prediction ◽

Similarity Learning ◽

Gene Function Prediction

Download Full-text

LSTrAP-Crowd: Prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data

10.1101/2020.04.20.005249 ◽

2020 ◽

Author(s):

Benedict Hew ◽

Qiao Wen Tan ◽

William Goh ◽

Jonathan Wei Xiong Ng ◽

Kenny Koh ◽

...

Keyword(s):

Gene Expression ◽

Protein Synthesis ◽

Rna Sequencing ◽

Gene Expression Data ◽

Large Scale ◽

Bacterial Resistance ◽

Expression Data ◽

Sequencing Data ◽

Novel Proteins ◽

Novel Antibiotics

AbstractBacterial resistance to antibiotics is a growing problem that is projected to cause more deaths than cancer in 2050. Consequently, novel antibiotics are urgently needed. Since more than half of the available antibiotics target the bacterial ribosomes, proteins that are involved in protein synthesis are thus prime targets for the development of novel antibiotics. However, experimental identification of these potential antibiotic target proteins can be labor-intensive and challenging, as these proteins are likely to be poorly characterized and specific to few bacteria. In order to identify these novel proteins, we established a Large-Scale Transcriptomic Analysis Pipeline in Crowd (LSTrAP-Crowd), where 285 individuals processed 26 terabytes of RNA-sequencing data of the 17 most notorious bacterial pathogens. In total, the crowd processed 26,269 RNA-seq experiments and used the data to construct gene co-expression networks, which were used to identify more than a hundred uncharacterized genes that were transcriptionally associated with protein synthesis. We provide the identity of these genes together with the processed gene expression data. The data can be used to identify other vulnerabilities or bacteria, while our approach demonstrates how the processing of gene expression data can be easily crowdsourced.

Download Full-text

Gene Function Prediction and Functional Network: The Role of Gene Ontology

Intelligent Systems Reference Library - Data Mining: Foundations and Intelligent Paradigms ◽

10.1007/978-3-642-23151-3_7 ◽

2012 ◽

pp. 123-162 ◽

Cited By ~ 1

Author(s):

Erliang Zeng ◽

Chris Ding ◽

Kalai Mathee ◽

Lisa Schneper ◽

Giri Narasimhan

Keyword(s):

Gene Ontology ◽

Gene Function ◽

Function Prediction ◽

Functional Network ◽

Gene Function Prediction

Download Full-text

A Universal, Genomewide GuideFinder for CRISPR/Cas9 Targeting in Microbial Genomes

mSphere ◽

10.1128/msphere.00086-20 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Michelle Spoto ◽

Changhui Guan ◽

Elizabeth Fleming ◽

Julia Oh

Keyword(s):

Gene Function ◽

Large Scale ◽

Essential Gene ◽

Bacterial Species ◽

Bacterial Genome ◽

Model Organisms ◽

Design Parameters ◽

Bacterial Genomes ◽

Wide Range ◽

User Friendly

ABSTRACT The CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression and activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, few programs are specifically tailored to identify guides in draft bacterial genomes genomewide. Furthermore, few programs offer open-source code with flexible design parameters for bacterial targeting. To address these limitations, we created GuideFinder, a customizable, user-friendly program that can design guides for any annotated bacterial genome. GuideFinder designs guides from NGG protospacer-adjacent motif (PAM) sites for any number of genes by the use of an annotated genome and FASTA file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, one of several features unique to GuideFinder. GuideFinder can also identify paired guides for targeting multiplicity, whose validity we tested experimentally. GuideFinder has been tested on a variety of diverse bacterial genomes, finding guides for 95% of genes on average. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species. Through the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies of a variety of bacterial species. IMPORTANCE With the explosion in our understanding of human and environmental microbial diversity, corresponding efforts to understand gene function in these organisms are strongly needed. CRISPR/Cas9 technology has revolutionized interrogation of gene function in a wide variety of model organisms. Efficient CRISPR guide design is required for systematic gene targeting. However, existing tools are not adapted for the broad needs of microbial targeting, which include extraordinary species and subspecies genetic diversity, the overwhelming majority of which is characterized by draft genomes. In addition, flexibility in guide design parameters is important to consider the wide range of factors that can affect guide efficacy, many of which can be species and strain specific. We designed GuideFinder, a customizable, user-friendly program that addresses the limitations of existing software and that can design guides for any annotated bacterial genome with numerous features that facilitate guide design in a wide variety of microorganisms.

Download Full-text