Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

BioMed Research International ◽

10.1155/2014/348725 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Anjani Ragothaman ◽

Sairam Chowdary Boddu ◽

Nayong Kim ◽

Wei Feinstein ◽

Michal Brylinski ◽

...

Keyword(s):

Large Scale ◽

Structural Information ◽

Structural Bioinformatics ◽

Genome Wide ◽

Viable Solution ◽

A Genome ◽

Structural Systems Biology ◽

Computational Resources ◽

Genome Scale ◽

Level Parallelism

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

Download Full-text

Structural Systems Biology Evaluation of Metabolic Thermotolerance in Escherichia coli

Science ◽

10.1126/science.1234012 ◽

2013 ◽

Vol 340 (6137) ◽

pp. 1220-1223 ◽

Cited By ~ 80

Author(s):

Roger L. Chang ◽

Kathleen Andrews ◽

Donghyuk Kim ◽

Zhanwen Li ◽

Adam Godzik ◽

...

Keyword(s):

Escherichia Coli ◽

Systems Biology ◽

Structural Information ◽

Protein Structures ◽

Limiting Factors ◽

Scale Model ◽

Structural Systems ◽

A Genome ◽

Structural Systems Biology ◽

Genome Scale

Genome-scale network reconstruction has enabled predictive modeling of metabolism for many systems. Traditionally, protein structural information has not been represented in such reconstructions. Expansion of a genome-scale model of Escherichia coli metabolism by including experimental and predicted protein structures enabled the analysis of protein thermostability in a network context. This analysis allowed the prediction of protein activities that limit network function at superoptimal temperatures and mechanistic interpretations of mutations found in strains adapted to heat. Predicted growth-limiting factors for thermotolerance were validated through nutrient supplementation experiments and defined metabolic sensitivities to heat stress, providing evidence that metabolic enzyme thermostability is rate-limiting at superoptimal temperatures. Inclusion of structural information expanded the content and predictive capability of genome-scale metabolic networks that enable structural systems biology of metabolism.

Download Full-text

Arabidopsis Genes Essential for Seedling Viability: Isolation of Insertional Mutants and Molecular Cloning

Genetics ◽

10.1093/genetics/159.4.1765 ◽

2001 ◽

Vol 159 (4) ◽

pp. 1765-1778

Author(s):

Gregory J Budziszewski ◽

Sharon Potter Lewis ◽

Lyn Wegrich Glover ◽

Jennifer Reineke ◽

Gary Jones ◽

...

Keyword(s):

Large Scale ◽

Protein Translocation ◽

Gene Families ◽

Mutant Phenotype ◽

Lethal Mutant ◽

A Genome ◽

Genes Encoding ◽

High Level ◽

Mutant Lines ◽

Genome Scale

Abstract We have undertaken a large-scale genetic screen to identify genes with a seedling-lethal mutant phenotype. From screening ~38,000 insertional mutant lines, we identified >500 seedling-lethal mutants, completed cosegregation analysis of the insertion and the lethal phenotype for >200 mutants, molecularly characterized 54 mutants, and provided a detailed description for 22 of them. Most of the seedling-lethal mutants seem to affect chloroplast function because they display altered pigmentation and affect genes encoding proteins predicted to have chloroplast localization. Although a high level of functional redundancy in Arabidopsis might be expected because 65% of genes are members of gene families, we found that 41% of the essential genes found in this study are members of Arabidopsis gene families. In addition, we isolated several interesting classes of mutants and genes. We found three mutants in the recently discovered nonmevalonate isoprenoid biosynthetic pathway and mutants disrupting genes similar to Tic40 and tatC, which are likely to be involved in chloroplast protein translocation. Finally, we directly compared T-DNA and Ac/Ds transposon mutagenesis methods in Arabidopsis on a genome scale. In each population, we found only about one-third of the insertion mutations cosegregated with a mutant phenotype.

Download Full-text

A genome-wide screen uncovers multiple roles for mitochondrial nucleoside diphosphate kinase D in inflammasome activation

Science Signaling ◽

10.1126/scisignal.abe0387 ◽

2021 ◽

Vol 14 (694) ◽

pp. eabe0387

Author(s):

Orna Ernst ◽

Jing Sun ◽

Bin Lin ◽

Balaji Banoth ◽

Michael G. Dorrington ◽

...

Keyword(s):

Nucleoside Diphosphate Kinase ◽

Multiple Roles ◽

Metabolic Reprogramming ◽

Inflammasome Activation ◽

Ros Production ◽

Genome Wide ◽

A Genome ◽

Mouse Macrophages ◽

Genome Scale ◽

Mitochondrial Dna Synthesis

Noncanonical inflammasome activation by cytosolic lipopolysaccharide (LPS) is a critical component of the host response to Gram-negative bacteria. Cytosolic LPS recognition in macrophages is preceded by a Toll-like receptor (TLR) priming signal required to induce transcription of inflammasome components and facilitate the metabolic reprograming that fuels the inflammatory response. Using a genome-scale arrayed siRNA screen to find inflammasome regulators in mouse macrophages, we identified the mitochondrial enzyme nucleoside diphosphate kinase D (NDPK-D) as a regulator of both noncanonical and canonical inflammasomes. NDPK-D was required for both mitochondrial DNA synthesis and cardiolipin exposure on the mitochondrial surface in response to inflammasome priming signals mediated by TLRs, and macrophages deficient in NDPK-D had multiple defects in LPS-induced inflammasome activation. In addition, NDPK-D was required for the recruitment of TNF receptor–associated factor 6 (TRAF6) to mitochondria, which was critical for reactive oxygen species (ROS) production and the metabolic reprogramming that supported the TLR-induced gene program. NDPK-D knockout mice were protected from LPS-induced shock, consistent with decreased ROS production and attenuated glycolytic commitment during priming. Our findings suggest that, in response to microbial challenge, NDPK-D–dependent TRAF6 mitochondrial recruitment triggers an energetic fitness checkpoint required to engage and maintain the transcriptional program necessary for inflammasome activation.

Download Full-text

Phenotypic Screen and Transcriptomics Approach Complement Each Other in Functional Genomics of Defensive Stink Gland Physiology

10.21203/rs.3.rs-1117784/v1 ◽

2021 ◽

Author(s):

Sabrina Lehmann ◽

Bibi Atika ◽

Daniela Grossmann ◽

Christian Schmitt-Engel ◽

Nadi Strohlein ◽

...

Keyword(s):

Functional Genomics ◽

Large Scale ◽

Reverse Genetics ◽

Expression Profiles ◽

Forward Genetics ◽

Large Set ◽

Knock Down ◽

Genome Wide ◽

Phenotypic Screen ◽

A Genome

Abstract Background Functional genomics uses unbiased systematic genome-wide gene disruption or analyzes natural variations such as gene expression profiles of different tissues from multicellular organisms to link gene functions to particular phenotypes. Functional genomics approaches are of particular importance to identify large sets of genes that are specifically important for a particular biological process beyond known candidate genes, or when the process has not been studied with genetic methods before. Results Here, we present a large set of genes whose disruption interferes with the function of the odoriferous defensive stink glands of the red flour beetle Tribolium castaneum. This gene set is the result of a large-scale systematic phenotypic screen using a reverse genetics strategy based on RNA interference applied in a genome-wide forward genetics manner. In this first-pass screen, 130 genes were identified, of which 69 genes could be confirmed to cause knock-down gland phenotypes, which vary from necrotic tissue and irregular reservoir size to irregular color or separation of the secreted gland compounds. The knock-down of 13 genes caused specifically a strong reduction of para-benzoquinones, suggesting a specific function in the synthesis of these toxic compounds. Only 14 of the 69 confirmed gland genes are differentially overexpressed in stink gland tissue and thus could have been detected in a transcriptome-based analysis. Moreover, of the 29 previously transcriptomics-identified genes causing a gland phenotype, only one gene was recognized by this phenotypic screen despite the fact that 13 of them were covered by the screen. Conclusion Our results indicate the importance of combining diverse and independent methodologies to identify genes necessary for the function of a certain biological tissue, as the different approaches do not deliver redundant results but rather complement each other. The presented phenotypic screen together with a transcriptomics approach are now providing a set of close to hundred genes important for odoriferous defensive stink gland physiology in beetles.

Download Full-text

A New Genome-to-Genome Comparison Approach for Large-Scale Revisiting of Current Microbial Taxonomy

Microorganisms ◽

10.3390/microorganisms7060161 ◽

2019 ◽

Vol 7 (6) ◽

pp. 161 ◽

Cited By ~ 1

Author(s):

Ming-Hsin Tsai ◽

Yen-Yi Liu ◽

Von-Wun Soo ◽

Chih-Chieh Chen

Keyword(s):

Microbial Diversity ◽

Large Scale ◽

Gene Selection ◽

Marker Gene ◽

Genome Comparison ◽

Marker Genes ◽

Species Classification ◽

Genome Wide ◽

A Genome ◽

Comparison Approach

Microbial diversity has always presented taxonomic challenges. With the popularity of next-generation sequencing technology, more unculturable bacteria have been sequenced, facilitating the discovery of additional new species and complicated current microbial classification. The major challenge is to assign appropriate taxonomic names. Hence, assessing the consistency between taxonomy and genomic relatedness is critical. We proposed and applied a genome comparison approach to a large-scale survey to investigate the distribution of genomic differences among microorganisms. The approach applies a genome-wide criterion, homologous coverage ratio (HCR), for describing the homology between species. The survey included 7861 microbial genomes that excluded plasmids, and 1220 pairs of genera exhibited ambiguous classification. In this study, we also compared the performance of HCR and average nucleotide identity (ANI). The results indicated that HCR and ANI analyses yield comparable results, but a few examples suggested that HCR has a superior clustering effect. In addition, we used the Genome Taxonomy Database (GTDB), the gold standard for taxonomy, to validate our analysis. The GTDB offers 120 ubiquitous single-copy proteins as marker genes for species classification. We determined that the analysis of the GTDB still results in classification boundary blur between some genera and that the marker gene-based approach has limitations. Although the choice of marker genes has been quite rigorous, the bias of marker gene selection remains unavoidable. Therefore, methods based on genomic alignment should be considered for use for species classification in order to avoid the bias of marker gene selection. On the basis of our observations of microbial diversity, microbial classification should be re-examined using genome-wide comparisons.

Download Full-text

Advances in metabolic flux analysis toward genome-scale profiling of higher organisms

Bioscience Reports ◽

10.1042/bsr20170224 ◽

2018 ◽

Vol 38 (6) ◽

Cited By ~ 11

Author(s):

Georg Basler ◽

Alisdair R. Fernie ◽

Zoran Nikoloski

Keyword(s):

Large Scale ◽

Metabolic Flux ◽

Cell Types ◽

Flux Analysis ◽

Metabolic Labeling ◽

Modeling Framework ◽

Metabolomics Data ◽

Technological Advances ◽

A Genome ◽

Genome Scale

Methodological and technological advances have recently paved the way for metabolic flux profiling in higher organisms, like plants. However, in comparison with omics technologies, flux profiling has yet to provide comprehensive differential flux maps at a genome-scale and in different cell types, tissues, and organs. Here we highlight the recent advances in technologies to gather metabolic labeling patterns and flux profiling approaches. We provide an opinion of how recent local flux profiling approaches can be used in conjunction with the constraint-based modeling framework to arrive at genome-scale flux maps. In addition, we point at approaches which use metabolomics data without introduction of label to predict either non-steady state fluxes in a time-series experiment or flux changes in different experimental scenarios. The combination of these developments allows an experimentally feasible approach for flux-based large-scale systems biology studies.

Download Full-text

A Comprehensive Survey on the Terpene Synthase Gene Family Provides New Insight into Its Evolutionary Patterns

Genome Biology and Evolution ◽

10.1093/gbe/evz142 ◽

2019 ◽

Vol 11 (8) ◽

pp. 2078-2098 ◽

Cited By ~ 8

Author(s):

Shu-Ye Jiang ◽

Jingjing Jin ◽

Rajani Sarojam ◽

Srinivasan Ramachandran

Keyword(s):

Gene Family ◽

Large Scale ◽

Family Members ◽

Terpene Synthase ◽

Limited Information ◽

Terpene Synthases ◽

Genome Wide ◽

A Genome ◽

Family Expansion ◽

Insight Into

Abstract Terpenes are organic compounds and play important roles in plant growth and development as well as in mediating interactions of plants with the environment. Terpene synthases (TPSs) are the key enzymes responsible for the biosynthesis of terpenes. Although some species were employed for the genome-wide identification and characterization of the TPS family, limited information is available regarding the evolution, expansion, and retention mechanisms occurring in this gene family. We performed a genome-wide identification of the TPS family members in 50 sequenced genomes. Additionally, we also characterized the TPS family from aromatic spearmint and basil plants using RNA-Seq data. No TPSs were identified in algae genomes but the remaining plant species encoded various numbers of the family members ranging from 2 to 79 full-length TPSs. Some species showed lineage-specific expansion of certain subfamilies, which might have contributed toward species or ecotype divergence or environmental adaptation. A large-scale family expansion was observed mainly in dicot and monocot plants, which was accompanied by frequent domain loss. Both tandem and segmental duplication significantly contributed toward family expansion and expression divergence and played important roles in the survival of these expanded genes. Our data provide new insight into the TPS family expansion and evolution and suggest that TPSs might have originated from isoprenyl diphosphate synthase genes.

Download Full-text

Large-scale GWAS reveals insights into the genetic architecture of same-sex sexual behavior

Science ◽

10.1126/science.aat7693 ◽

2019 ◽

Vol 365 (6456) ◽

pp. eaat7693 ◽

Cited By ~ 53

Author(s):

Andrea Ganna ◽

Karin J. H. Verweij ◽

Michel G. Nivard ◽

Robert Maier ◽

Robbee Wedow ◽

...

Keyword(s):

Sexual Behavior ◽

Genetic Architecture ◽

Large Scale ◽

Genome Wide Association Study ◽

Same Sex ◽

Genome Wide ◽

A Genome ◽

Number Of Sexual Partners ◽

Opposite Sex ◽

Males And Females

Twin and family studies have shown that same-sex sexual behavior is partly genetically influenced, but previous searches for specific genes involved have been underpowered. We performed a genome-wide association study (GWAS) on 477,522 individuals, revealing five loci significantly associated with same-sex sexual behavior. In aggregate, all tested genetic variants accounted for 8 to 25% of variation in same-sex sexual behavior, only partially overlapped between males and females, and do not allow meaningful prediction of an individual’s sexual behavior. Comparing these GWAS results with those for the proportion of same-sex to total number of sexual partners among nonheterosexuals suggests that there is no single continuum from opposite-sex to same-sex sexual behavior. Overall, our findings provide insights into the genetics underlying same-sex sexual behavior and underscore the complexity of sexuality.

Download Full-text

Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1808833115 ◽

2018 ◽

Vol 116 (3) ◽

pp. 900-908 ◽

Cited By ~ 4

Author(s):

Hamutal Arbel ◽

Sumanta Basu ◽

William W. Fisher ◽

Ann S. Hammonds ◽

Kenneth H. Wan ◽

...

Keyword(s):

Large Scale ◽

Scale Validation ◽

Expression Patterns ◽

Prediction Method ◽

High Accuracy ◽

Genome Wide ◽

Rank List ◽

A Genome ◽

Improved Accuracy ◽

Genome Wide Scan

Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.

Download Full-text

Patterns of Metabolite Changes Identified from Large-Scale Gene Perturbations in Arabidopsis Using a Genome-Scale Metabolic Network

PLANT PHYSIOLOGY ◽

10.1104/pp.114.252361 ◽

2015 ◽

Vol 167 (4) ◽

pp. 1685-1698 ◽

Cited By ~ 18

Author(s):

Taehyong Kim ◽

Kate Dreher ◽

Ricardo Nilo-Poyanco ◽

Insuk Lee ◽

Oliver Fiehn ◽

...

Keyword(s):

Metabolic Network ◽

Large Scale ◽

A Genome ◽

Genome Scale ◽

Gene Perturbations

Download Full-text