scholarly journals The Co-regulation Data Harvester for Tetrahymena thermophila: automated high-throughput gene annotation and functional inference in a microbial eukaryote

2017 ◽  
Author(s):  
Lev M. Tsypin ◽  
Aaron P. Turkewitz

AbstractIdentifying co-regulated genes can provide a useful approach for defining pathway-specific machinery in an organism. To be efficient, this approach relies on thorough genome annotation, which is not available for most organisms with sequenced genomes. Studies in Tetrahymena thermophila, the most experimentally accessible ciliate, have generated a rich transcriptomic database covering many well-defined physiological states. Genes that are involved in the same pathway show significant co-regulation, and screens based on gene co-regulation have identified novel factors in specific pathways, for example in membrane trafficking. However, a limitation has been the relatively sparse annotation of the Tetrahymena genome, making it impractical to approach genome-wide analyses. We have therefore developed an efficient approach to analyze both co-regulation and gene annotation, called the Co-regulation Data Harvester (CDH). The CDH automates identification of co-regulated genes by accessing the Tetrahymena transcriptome database, determines their orthologs in other organisms via reciprocal BLAST searches, and collates the annotations of those orthologs' functions. Inferences drawn from the CDH reproduce and expand upon experimental findings in Tetrahymena. The CDH, which is freely available, represents a powerful new tool for analyzing cell biological pathways in Tetrahymena. Moreover, to the extent that genes and pathways are conserved between organisms, the inferences obtained via the CDH should be relevant, and can be explored, in many other systems.

2021 ◽  
Vol 7 (3) ◽  
pp. eabd9036
Author(s):  
Sara Saez-Atienzar ◽  
Sara Bandres-Ciga ◽  
Rebekah G. Langston ◽  
Jonggeol J. Kim ◽  
Shing Wan Choi ◽  
...  

Despite the considerable progress in unraveling the genetic causes of amyotrophic lateral sclerosis (ALS), we do not fully understand the molecular mechanisms underlying the disease. We analyzed genome-wide data involving 78,500 individuals using a polygenic risk score approach to identify the biological pathways and cell types involved in ALS. This data-driven approach identified multiple aspects of the biology underlying the disease that resolved into broader themes, namely, neuron projection morphogenesis, membrane trafficking, and signal transduction mediated by ribonucleotides. We also found that genomic risk in ALS maps consistently to GABAergic interneurons and oligodendrocytes, as confirmed in human single-nucleus RNA-seq data. Using two-sample Mendelian randomization, we nominated six differentially expressed genes (ATG16L2, ACSL5, MAP1LC3A, MAPKAPK3, PLXNB2, and SCFD1) within the significant pathways as relevant to ALS. We conclude that the disparate genetic etiologies of this fatal neurological disease converge on a smaller number of final common pathways and cell types.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Michael F. Z. Wang ◽  
Madhav Mantri ◽  
Shao-Pei Chou ◽  
Gaetano J. Scuderi ◽  
David W. McKellar ◽  
...  

AbstractConventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260709
Author(s):  
Shaimaa Mahmoud Ahmed ◽  
Alsamman Mahmoud Alsamman ◽  
Abdulqader Jighly ◽  
Mohamed Hassan Mubarak ◽  
Khaled Al-Shamaa ◽  
...  

Soil salinity is significant abiotic stress that severely limits global crop production. Chickpea (Cicer arietinum L.) is an important grain legume that plays a substantial role in nutritional food security, especially in the developing world. This study used a chickpea population collected from the International Center for Agricultural Research in the Dry Area (ICARDA) genebank using the focused identification of germplasm strategy. The germplasm included 186 genotypes with broad Asian and African origins and genotyped with 1856 DArTseq markers. We conducted phenotyping for salinity in the field (Arish, Sinai, Egypt) and greenhouse hydroponic experiments at 100 mM NaCl concentration. Based on the performance in both hydroponic and field experiments, we identified seven genotypes from Azerbaijan and Pakistan (IGs: 70782, 70430, 70764, 117703, 6057, 8447, and 70249) as potential sources for high salinity tolerance. Multi-trait genome-wide association analysis (mtGWAS) detected one locus on chromosome Ca4 at 10618070 bp associated with salinity tolerance under hydroponic and field conditions. In addition, we located another locus specific to the hydroponic system on chromosome Ca2 at 30537619 bp. Gene annotation analysis revealed the location of rs5825813 within the Embryogenesis-associated protein (EMB8-like), while the location of rs5825939 is within the Ribosomal Protein Large P0 (RPLP0). Utilizing such markers in practical breeding programs can effectively improve the adaptability of current chickpea cultivars in saline soil. Moreover, researchers can use our markers to facilitate the incorporation of new genes into commercial cultivars.


2021 ◽  
Author(s):  
Anna Reznichenko ◽  
Viji Nair ◽  
Sean Eddy ◽  
Mark Tomilo ◽  
Timothy Slidel ◽  
...  

Current classification of chronic kidney disease (CKD) into stages based on the indirect measures of kidney functional state, estimated glomerular filtration rate and albuminuria, is agnostic to the heterogeneity of underlying etiologies, histopathology, and molecular processes. We used genome-wide transcriptomics from patients kidney biopsies, directly reflecting kidney biological processes, to stratify patients from three independent CKD cohorts. Unsupervised Self-Organizing Maps (SOM), an artificial neural network algorithm, assembled CKD patients into four novel subgroups, molecular categories, based on the similarity of their kidney transcriptomics profiles. The unbiased, molecular categories were present across CKD stages and histopathological diagnoses, highlighting heterogeneity of conventional clinical subgroups at the molecular level. CKD molecular categories were distinct in terms of biological pathways, transcriptional regulation and associated kidney cell types, indicating that the molecular categorization is founded on biologically meaningful mechanisms. Importantly, our results revealed that not all biological pathways are equally activated in all patients; instead, different pathways could be more dominant in different subgroups and thereby differentially influencing disease progression and outcomes. This first kidney-centric unbiased categorization of CKD paves the way to an integrated clinical, morphological and molecular diagnosis. This is a key step towards enabling precision medicine for this heterogeneous condition with the potential to advance biological understanding, clinical management, and drug development, as well as establish a roadmap for molecular reclassification of CKD and other complex diseases.


2017 ◽  
Vol 3 (2) ◽  
pp. 38 ◽  
Author(s):  
Vladislava Milchevskaya ◽  
Grischa Tödt ◽  
Toby James Gibson

Genome-wide expression profiling and genotyping is widely applied in functional genomics research, ranging from stem cell studies to cancer, in drug response studies, and in clinical diagnostics. The Affymetrix GeneChip microarrays represent the most popular platform for such assays. Nevertheless, due to rapid and continuous improvement of the knowledge about the genome, the definition of many of the genes and transcripts change, and new genes are discovered. Thus the original probe information is out-dated for a number of Affymetrix platforms, and needs to be re-defined. It has been demonstrated, that accurate probe set definition improves both coverage of the gene expression analysis and its statistical power. Therefore we developed a method that incorporates the most recent genome annotations into the annotation of the microarray probe sets, using tools from the next generation sequencing. Additionally our method allows to quickly build project specific gene annotation models, as well as for comparison of microarray to RNAseq data.


2021 ◽  
Author(s):  
Dennis A Sun ◽  
Nipam H Patel

AbstractEmerging research organisms enable the study of biology that cannot be addressed using classical “model” organisms. The development of novel data resources can accelerate research in such animals. Here, we present new functional genomic resources for the amphipod crustacean Parhyale hawaiensis, facilitating the exploration of gene regulatory evolution using this emerging research organism. We use Omni-ATAC-Seq, an improved form of the Assay for Transposase-Accessible Chromatin coupled with next-generation sequencing (ATAC-Seq), to identify accessible chromatin genome-wide across a broad time course of Parhyale embryonic development. This time course encompasses many major morphological events, including segmentation, body regionalization, gut morphogenesis, and limb development. In addition, we use short- and long-read RNA-Seq to generate an improved Parhyale genome annotation, enabling deeper classification of identified regulatory elements. We leverage a variety of bioinformatic tools to discover differential accessibility, predict nucleosome positioning, infer transcription factor binding, cluster peaks based on accessibility dynamics, classify biological functions, and correlate gene expression with accessibility. Using a Minos transposase reporter system, we demonstrate the potential to identify novel regulatory elements using this approach, including distal regulatory elements. This work provides a platform for the identification of novel developmental regulatory elements in Parhyale, and offers a framework for performing such experiments in other emerging research organisms.Primary Findings-Omni-ATAC-Seq identifies cis-regulatory elements genome-wide during crustacean embryogenesis-Combined short- and long-read RNA-Seq improves the Parhyale genome annotation-ImpulseDE2 analysis identifies dynamically regulated candidate regulatory elements-NucleoATAC and HINT-ATAC enable inference of nucleosome occupancy and transcription factor binding-Fuzzy clustering reveals peaks with distinct accessibility and chromatin dynamics-Integration of accessibility and gene expression reveals possible enhancers and repressors-Omni-ATAC can identify known and novel regulatory elements


2019 ◽  
Vol 21 (Supplement_6) ◽  
pp. vi61-vi62
Author(s):  
Pia Hoellerbauer ◽  
Megan Kufeld ◽  
Sonali Arora ◽  
Emily Girard ◽  
James Olson ◽  
...  

Abstract Precision oncology is largely based on the notion that identification and targeting of oncogenic drivers will lead to improved clinical outcomes. However, the promise of precision oncology awaits to be fulfilled for many cancers, including Glioblastoma (GBM), where identification of oncogenic drivers has yet to improve survival rates. Here, we have attempted to systematically identify GBM vulnerabilities by performing genome-wide CRISRP-Cas9 lethality screens in patient-derived GBM stem-like cells (GSCs). In validation studies, we comprehensively retested GSC-specific hits in multiple GSC isolates, which were also genomically profiled (e.g. RNA-seq, exome-seq, CNV), and further integrated these data with CRISPR-Cas9 lethality screens from over 500 human cell lines from the Broad Institute’s CRISPR Avana dataset. As a result, we have begun making GBM dependency predictions and functional associations for top scoring hits, including: tumor developmental subtype; loss of functional redundancy with other genes/proteins; cancer-specific subnetworks of genes involved in mitochondrial protein turnover and membrane trafficking; and genes of unknown function essential for subset of GBMs. A few examples of these categories include the following scenarios. We find ADAR (Adenosine Deaminase RNA Specific) gene dependency is associated with the mesenchymal GBM subtype. The EFR3Agene, which has roles in maintaining active pools of phosphatidylinositol 4-kinase, appears required when the expression of its paralog EFR3Bis low or absent in tumor cells. The F-box protein-encoding gene FBXO42appears non-essential to most human cells lines and neural stem cells, but when knocked out in sensitive GSCs causes mitotic arrest, mitotic catastrophe, and cell death. While still a work in progress, we hope to use these results as a foundation for exploring and illuminating patient-specific molecular vulnerabilities for brain tumors. The results also underscore the need for integration of functional genetic approaches, where gene activities are inhibited, into precision oncology paradigms.


Sign in / Sign up

Export Citation Format

Share Document