ExTraMapper: Exon- and Transcript-level mappings for orthologous gene pairs

Mapping Intimacies ◽

10.1101/277723 ◽

2018 ◽

Author(s):

Ferhat Ay ◽

Abhijit Chakraborty ◽

Ramana V. Davuluri

Keyword(s):

Large Scale ◽

Gene Annotation ◽

Orthologous Gene ◽

Transcript Level ◽

Specific Expression ◽

Gene Pairs ◽

Link Type ◽

Mouse Tissues ◽

Gene Level ◽

Human And Mouse

ABSTRACTAccess to large-scale genomics and transcriptomics data from various tissues and cell lines allowed the discovery of wide-spread alternative splicing events and alternative promoter usage in mammalians. However, evolutionary studies primarily focus on gene-level orthology relationships, which hinders the importance of transcript-level diversity. Between human and mouse, gene-level orthology is currently present for nearly 16k protein-coding genes spanning a diverse repertoire of over 200k total transcript isoforms. Here we describe a novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level. ExTraMapper identifies more than 250k exon, as well as 30k transcript mappings between human and mouse using only sequence and gene annotation information. We demonstrate that ExTraMapper identifies a larger number of exon and transcript mappings compared to previous methods. Further, it identifies exon fusions, splits, and losses due to splice site mutations, and finds mappings between microexons that are previously missed. By reanalysis of RNA-seq data from 13 matched human and mouse tissues, we show that ExTraMapper improves the correlation of transcript-specific expression levels suggesting a more accurate mapping of human and mouse transcripts. ExTraMapper also reports better transcript-level mappings compared to Ensembl orthology for the human proto-oncogene BRAF and its mouse ortholog as well as several other example genes with important isoform-specific functions. ExTraMapper is applicable to any pair of organisms that have orthologous gene pairs and is available at https://github.com/ay-lab/ExTraMapper and http://ay-lab-tools.lji.org/extramapper

Download Full-text

Identification and Characterization of the Potential Promoter Regions of 1031 Kinds of Human Genes

Genome Research ◽

10.1101/gr.164001 ◽

2001 ◽

Vol 11 (5) ◽

pp. 677-684

Author(s):

Yutaka Suzuki ◽

Tatsuhiko Tsunoda ◽

Jun Sese ◽

Hirotoshi Taira ◽

Junko Mizushima-Sugano ◽

...

Keyword(s):

Large Scale ◽

Expression Patterns ◽

Cpg Islands ◽

Genomic Sequences ◽

Cdna Libraries ◽

Promoter Regions ◽

Specific Expression ◽

Link Type ◽

Potential Promoter ◽

E Boxes

To understand the mechanism of transcriptional regulation, it is essential to identify and characterize the promoter, which is located proximal to the mRNA start site. To identify the promoters from the large volumes of genomic sequences, we used mRNA start sites determined by a large-scale sequencing of the cDNA libraries constructed by the “oligo-capping” method. We aligned the mRNA start sites with the genomic sequences and retrieved adjacent sequences as potential promoter regions (PPRs) for 1031 genes. The PPR sequences were searched to determine the frequencies of major promoter elements. Among 1031 PPRs, 329 (32%) contained TATA boxes, 872 (85%) contained initiators, 999 (97%) contained GC box, and 663 (64%) contained CAAT box. Furthermore, 493 (48%) PPRs were located in CpG islands. This frequency of CpG islands was reduced in TATA+/Inr+PPRs and in the PPRs of ubiquitously expressed genes. In the PPRs of the CGM2 gene, the DRA gene, and theTM30pl genes, which showed highly colon specific expression patterns, the consensus sequences of E boxes were commonly observed. The PPRs were also useful for exploring promoter SNPs.[The nucleotide sequences described in this paper have been deposited in the DDBJ, EMBL, and GenBank data libraries under accession nos.AU098358–AU100608.]

Download Full-text

Computational approach to identifying universal macrophage biomarker

10.1101/807347 ◽

2019 ◽

Author(s):

Dharanidhar Dang ◽

Sahar Taheri ◽

Soumita Das ◽

Pradipta Ghosh ◽

Lawrence S. Prince ◽

...

Keyword(s):

Expression Patterns ◽

Computational Approach ◽

Biological Data ◽

Specific Expression ◽

Cellular Debris ◽

Mouse Tissues ◽

Gene Expression Dynamics ◽

Reliable Marker ◽

Multiple Species ◽

Human And Mouse

ABSTRACTMacrophages are a type of white blood cell, of the immune system, that engulfs and digests cellular debris, cancer cells, and anything else that does not have the type of proteins specific to healthy body cells on its surface. Understanding gene expression dynamics in macrophages are crucial for studying human diseases. Recent advances in high-throughput technologies have enabled the collection of immense amounts of biological data. A reliable marker of macrophage is essential to study their function. Traditional approaches use a number of markers that may have tissue specific expression patterns. To identify universal biomarker of macrophage, we used a previously published computational approach called BECC (Boolean Equivalent Correlated Clusters) that was originally used to identify universal cell cycle genes. We performed BECC analysis on a seed gene CD14, a known macrophage marker. FCER1G and TYROBP were among the top candidates which were validated as strong candidates for universal biomarkers for macrophages in human and mouse tissues. To our knowledge, such a finding is first of its kind.CONTRIBUTIONS TO THE FIELDWe have developed a computational approach to identify universal biomarkers of different entities in a biological system. We applied this approach to study macrophages and identified universal biomarkers of this particular cell type. FCER1G and TYROBP were among the top candidates which were validated as strong candidates for universal biomarkers for macrophages in human and mouse tissues. The expression patterns of TYROBP and FCER1G are found to be more homogeneous compared to currently used biomarkers such as ITGAM, EMR1 (F4/80), and CD68. Further, we demonstrated that this homogeneity extends to all the tissues currently profiled in the public domain in multiple species including human and mouse. FCER1G and TYROBP expression patterns were also found to be extremely specific to macrophages found in various tissues. They are strongly co-expressed together. We believe that these two genes are the most reliable candidates of universal biomarker for macrophages.

Download Full-text

Quantification of Chitinase mRNA Levels in Human and Mouse Tissues by Real-Time PCR: Species-Specific Expression of Acidic Mammalian Chitinase in Stomach Tissues

PLoS ONE ◽

10.1371/journal.pone.0067399 ◽

2013 ◽

Vol 8 (6) ◽

pp. e67399 ◽

Cited By ~ 32

Author(s):

Misa Ohno ◽

Yuto Togashi ◽

Kyoko Tsuda ◽

Kazuaki Okawa ◽

Minori Kamaya ◽

...

Keyword(s):

Real Time ◽

Real Time Pcr ◽

Mrna Levels ◽

Specific Expression ◽

Mouse Tissues ◽

Acidic Mammalian Chitinase ◽

Species Specific ◽

Human And Mouse

Download Full-text

Association Analysis and Meta-Analysis of Multi-allelic Variants for Large Scale Sequence Data

10.1101/197913 ◽

2017 ◽

Author(s):

Xiaowei Zhan ◽

Sai Chen ◽

Yu Jiang ◽

Mengzhen Liu ◽

William G. Iacono ◽

...

Keyword(s):

Large Scale ◽

Rare Variants ◽

Sequence Data ◽

Meta Analysis ◽

Joint Modeling ◽

Allelic Variants ◽

Association Analyses ◽

Link Type ◽

Gene Level ◽

The Impact

AbstractMotivation:There is great interest to understand the impact of rare variants in human diseases using large sequence datasets. In deep sequences datasets of >10,000 samples, ∼10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results.Results:We propose novel methods to encode multi-allelic sites, conduct single variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ∼18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single variant association tests, and enhanced gene-level tests over existing approaches.Availability:Software packages implementing these methods are available at (https://github.com/zhanxw/rvtestshttp://genome.sph.umich.edu/wiki/RareMETAL).Contact:[email protected]; [email protected]

Download Full-text

AMAW: automated gene annotation for non-model eukaryotic genomes

10.1101/2021.12.07.471566 ◽

2021 ◽

Author(s):

Loïc Meunier ◽

Denis Baurain ◽

Luc Cornet

Keyword(s):

Genome Annotation ◽

Large Scale ◽

Gene Annotation ◽

Supplementary Information ◽

Supplementary Data ◽

Software Suite ◽

Perl Script ◽

Link Type ◽

Eukaryotic Genomes

AbstractSummaryTo support small and large-scale genome annotation projects, we present AMAW (Automated MAKER2 Annotation Wrapper), a program devised to annotate non-model unicellular eukaryotic genomes by automating the acquisition of evidence data (transcripts and proteins) and facilitating the use of MAKER2, a widely adopted software suite for the annotation of eukaryotic genomes. Moreover, AMAW exists as a Singularity container recipe easy to deploy on a grid computer, thereby overcoming the tricky installation of MAKER2.AvailabilityAMAW is released both as a Singularity container recipe and a standalone Perl script (https://bitbucket.org/phylogeno/amaw/)[email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

The Rhododendron Genome and Chromosomal Organization Provide Insight into Shared Whole-Genome Duplications across the Heath Family (Ericaceae)

Genome Biology and Evolution ◽

10.1093/gbe/evz245 ◽

2019 ◽

Vol 11 (12) ◽

pp. 3353-3371 ◽

Cited By ~ 6

Author(s):

Valerie L Soza ◽

Dale Lindsley ◽

Adam Waalkes ◽

Elizabeth Ramage ◽

Rupali P Patwardhan ◽

...

Keyword(s):

Genome Annotation ◽

Large Scale ◽

De Novo ◽

Gene Annotation ◽

Whole Genome ◽

Genomic Libraries ◽

Gene Pairs ◽

Whole Genome Duplications ◽

Genome Duplications ◽

Large Genus

Abstract The genus Rhododendron (Ericaceae), which includes horticulturally important plants such as azaleas, is a highly diverse and widely distributed genus of >1,000 species. Here, we report the chromosome-scale de novo assembly and genome annotation of Rhododendron williamsianum as a basis for continued study of this large genus. We created multiple short fragment genomic libraries, which were assembled using ALLPATHS-LG. This was followed by contiguity preserving transposase sequencing (CPT-seq) and fragScaff scaffolding of a large fragment library, which improved the assembly by decreasing the number of scaffolds and increasing scaffold length. Chromosome-scale scaffolding was performed by proximity-guided assembly (LACHESIS) using chromatin conformation capture (Hi-C) data. Chromosome-scale scaffolding was further refined and linkage groups defined by restriction-site associated DNA (RAD) sequencing of the parents and progeny of a genetic cross. The resulting linkage map confirmed the LACHESIS clustering and ordering of scaffolds onto chromosomes and rectified large-scale inversions. Assessments of the R. williamsianum genome assembly and gene annotation estimate them to be 89% and 79% complete, respectively. Predicted coding sequences from genome annotation were used in syntenic analyses and for generating age distributions of synonymous substitutions/site between paralgous gene pairs, which identified whole-genome duplications (WGDs) in R. williamsianum. We then analyzed other publicly available Ericaceae genomes for shared WGDs. Based on our spatial and temporal analyses of paralogous gene pairs, we find evidence for two shared, ancient WGDs in Rhododendron and Vaccinium (cranberry/blueberry) members that predate the Ericaceae family and, in one case, the Ericales order.

Download Full-text

recount3: summaries and queries for large-scale RNA-seq expression and splicing

Genome Biology ◽

10.1186/s13059-021-02533-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Christopher Wilks ◽

Shijie C. Zheng ◽

Feng Yong Chen ◽

Rone Charles ◽

Brad Solomon ◽

...

Keyword(s):

Rna Sequencing ◽

Large Scale ◽

Rna Seq ◽

Analysis Pipeline ◽

Web Resources ◽

Link Type ◽

Private Data ◽

Exon Junctions ◽

Human And Mouse

AbstractWe present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new analysis pipeline. To facilitate access to the data, we provide the and R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio.

Download Full-text

Genome-wide identification and characterization of GRAS transcription factors in sacred lotus (Nelumbo nucifera)

PeerJ ◽

10.7717/peerj.2388 ◽

2016 ◽

Vol 4 ◽

pp. e2388 ◽

Cited By ~ 14

Author(s):

Yu Wang ◽

Shenglu Shi ◽

Ying Zhou ◽

Yu Zhou ◽

Jie Yang ◽

...

Keyword(s):

Gene Family ◽

Higher Plants ◽

Orthologous Gene ◽

Gene Families ◽

Nelumbo Nucifera ◽

Specific Gene ◽

Specific Expression ◽

Gene Pairs ◽

Sacred Lotus ◽

Gras Gene

The GRAS gene family is one of the most important plant-specific gene families, which encodes transcriptional regulators and plays an essential role in plant development and physiological processes. The GRAS gene family has been well characterized in many higher plants such asArabidopsis, rice, Chinese cabbage, tomato and tobacco. In this study, we identified 38 GRAS genes in sacred lotus (Nelumbo nucifera), analyzed their physical and chemical characteristics and performed phylogenetic analysis using the GRAS genes from eight representative plant species to show the evolution of GRAS genes inPlanta. In addition, the gene structures and motifs of the sacred lotus GRAS proteins were characterized in detail. Comparative analysis identified 42 orthologous and 9 co-orthologous gene pairs between sacred lotus andArabidopsis, and 35 orthologous and 22 co-orthologous gene pairs between sacred lotus and rice. Based on publically available RNA-seq data generated from leaf, petiole, rhizome and root, we found that most of the sacred lotus GRAS genes exhibited a tissue-specific expression pattern. Eight of the ten PAT1-clade GRAS genes, particularly NnuGRAS-05, NnuGRAS-10 and NnuGRAS-25, were preferentially expressed in rhizome and root. In summary, this is the firstin silicoanalysis of the GRAS gene family in sacred lotus, which will provide valuable information for further molecular and biological analyses of this important gene family.

Download Full-text

Tissue‐specific expression and post‐transcriptional regulation of the ATPase inhibitory factor 1 (IF1) in human and mouse tissues

The FASEB Journal ◽

10.1096/fj.201800756r ◽

2018 ◽

Vol 33 (2) ◽

pp. 1836-1851 ◽

Cited By ~ 9

Author(s):

Pau B. Esparza-Moltó ◽

Cristina Nuevo-Taρioles ◽

Margarita Chamorro ◽

Laura Nájera ◽

Laura Torresano ◽

...

Keyword(s):

Transcriptional Regulation ◽

Inhibitory Factor ◽

Specific Expression ◽

Tissue Specific ◽

Tissue Specific Expression ◽

Mouse Tissues ◽

Post Transcriptional Regulation ◽

Human And Mouse ◽

Inhibitory Factor 1

Download Full-text

Faculty of 1000 evaluation for Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.

F1000 - Post-publication peer review of the biomedical literature ◽

10.3410/f.726079641.793513319 ◽

2016 ◽

Author(s):

Wolfgang Huber

Keyword(s):

Transcript Level ◽

Rna Seq ◽

Gene Level

Download Full-text