scholarly journals QuASAR: Quantitative Allele Specific Analysis of Reads

2014 ◽  
Author(s):  
Chris Harvey ◽  
Gregory A Moyebrailean ◽  
Omar Davis ◽  
Xiaoquan Wen ◽  
Francesca Luca ◽  
...  

Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. We present QuASAR, Quantitative Allele Specific Analysis of Reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Asia Mendelevich ◽  
Svetlana Vinogradova ◽  
Saumya Gupta ◽  
Andrey A. Mironov ◽  
Shamil R. Sunyaev ◽  
...  

AbstractA sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.


2017 ◽  
Author(s):  
Peter A. Combs ◽  
Hunter B. Fraser

AbstractSpatial patterning of gene expression is a key process in development—responsible for the incredible diversity of animal body plans—yet how it evolves is still poorly understood. Both cis- and trans-acting changes could accumulate and participate in complex interactions, so to isolate the cis-regulatory component of patterning evolution, we measured allele-specific spatial gene expression patterns inD. melanogaster×D. simulanshybrid embryos. RNA-seq of cryosectioned slices revealed 55 genes with strong spatially varying allele-specific expression, and several hundred more with weaker but significant spatial divergence. For example, we found thathunchback (hb), a major regulator of developmental patterning, had reduced expression specifically in the anterior tip ofD. simulansembryos. Mathematical modeling ofhbcis-regulation suggested that a mutation in a Bicoid binding site was responsible, which we verified using CRISPR-Cas9 genome editing. In sum, even comparing morphologically near-identical species we identified a substantial amount of spatial variation in gene expression, suggesting that development is robust to many such changes, but also that natural selection may have ample raw material for evolving new body plans via cis-regulatory divergence.


2021 ◽  
Vol 12 ◽  
Author(s):  
Frédéric Jehl ◽  
Fabien Degalez ◽  
Maria Bernard ◽  
Frédéric Lecerf ◽  
Laetitia Lagoutte ◽  
...  

In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.


2018 ◽  
Author(s):  
Sahar V. Mozaffari ◽  
Michelle M. Stein ◽  
Kevin M. Magnaye ◽  
Dan L. Nicolae ◽  
Carole Ober

AbstractGenomic imprinting is the phenomena that leads to silencing of one copy of a gene inherited from a specific parent. Mutations in imprinted regions have been involved in diseases showing parent of origin effects. Identifying genes with evidence of parent of origin expression patterns in family studies allows the detection of more subtle imprinting. Here, we use allele specific expression in lymphoblastoid cell lines from 306 Hutterites related in a single pedigree to provide formal evidence for parent of origin effects. We take advantage of phased genotype data to assign parent of origin to RNA-seq reads in individuals with gene expression data. Our approach identified known imprinted genes, two putative novel imprinted genes, and 14 genes with asymmetrical parent of origin gene expression. We used gene expression in peripheral blood leukocytes (PBL) to validate our findings, and then confirmed imprinting control regions (ICRs) using DNA methylation levels in the PBLs.Author SummaryLarge scale gene expression studies have identified known and novel imprinted genes through allele specific expression without knowing the parental origins of each allele. Here, we take advantage of phased genotype data to assign parent of origin to RNA-seq reads in 306 individuals with gene expression data. We identified known imprinted genes as well as two novel imprinted genes in lymphoblastoid cell line gene expression. We used gene expression in PBLs to validate our findings, and DNA methylation levels in PBLs to confirm previously characterized imprinting control regions that could regulate these imprinted genes.


2019 ◽  
Author(s):  
Mazdak Salavati ◽  
Stephen J. Bush ◽  
Sergio Palma-Vera ◽  
Mary E. B. McCulloch ◽  
David A. Hume ◽  
...  

AbstractPervasive allelic variation at both gene and single nucleotide level (SNV) between individuals is commonly associated with complex traits in humans and animals. Allele-specific expression (ASE) analysis, using RNA-Seq, can provide a detailed annotation of allelic imbalance and infer the existence of cis-acting transcriptional regulation. However, variant detection in RNA-Seq data is compromised by biased mapping of reads to the reference DNA sequence. In this manuscript we describe an unbiased standardised computational pipeline for allele-specific expression analysis using RNA-Seq data, which we have adapted and developed using tools available under open licence. The analysis pipeline we present is designed to minimise reference bias while providing accurate profiling of allele-specific expression across tissues and cell types. Using this methodology, we were able to profile pervasive allelic imbalance across tissues and cell types, at both the gene and SNV level, in Texel x Scottish Blackface sheep, using the sheep gene expression atlas dataset. ASE profiles were pervasive in each sheep and across all tissue types investigated. However, ASE profiles shared across tissues were limited and instead they tended to be highly tissue-specific. These tissue-specific ASE profiles may underlie the expression of economically important traits and could be utilized as weighted SNVs, for example, to improve the accuracy of genomic selection in breeding programmes for sheep. An additional benefit of the pipeline is that it does not require parental genotypes and can therefore be applied to other RNA-Seq datasets for livestock, including those available on the Functional Annotation of Animal Genomes (FAANG) data portal. This study is the first global characterisation of moderate to extreme ASE in tissues and cell types from sheep. We have applied a robust methodology for ASE profiling, to provide both a novel analysis of the multi-dimensional sheep gene expression atlas dataset, and a foundation for identifying the regulatory and expressed elements of the genome that are driving complex traits in livestock.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
M. Joseph Tomlinson ◽  
Shawn W. Polson ◽  
Jing Qiu ◽  
Juniper A. Lake ◽  
William Lee ◽  
...  

AbstractDifferential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.


Genetics ◽  
2013 ◽  
Vol 195 (3) ◽  
pp. 1157-1166 ◽  
Author(s):  
Sandrine Lagarrigue ◽  
Lisa Martin ◽  
Farhad Hormozdiari ◽  
Pierre-François Roux ◽  
Calvin Pan ◽  
...  

2018 ◽  
Vol 115 (47) ◽  
pp. E11081-E11090 ◽  
Author(s):  
Ryan A. York ◽  
Chinar Patil ◽  
Kawther Abdilleh ◽  
Zachary V. Johnson ◽  
Matthew A. Conte ◽  
...  

Many behaviors are associated with heritable genetic variation [Kendler and Greenspan (2006) Am J Psychiatry 163:1683–1694]. Genetic mapping has revealed genomic regions or, in a few cases, specific genes explaining part of this variation [Bendesky and Bargmann (2011) Nat Rev Gen 12:809–820]. However, the genetic basis of behavioral evolution remains unclear. Here we investigate the evolution of an innate extended phenotype, bower building, among cichlid fishes of Lake Malawi. Males build bowers of two types, pits or castles, to attract females for mating. We performed comparative genome-wide analyses of 20 bower-building species and found that these phenotypes have evolved multiple times with thousands of genetic variants strongly associated with this behavior, suggesting a polygenic architecture. Remarkably, F1 hybrids of a pit-digging and a castle-building species perform sequential construction of first a pit and then a castle bower. Analysis of brain gene expression in these hybrids showed that genes near behavior-associated variants display behavior-dependent allele-specific expression with preferential expression of the pit-digging species allele during pit digging and of the castle-building species allele during castle building. These genes are highly enriched for functions related to neurodevelopment and neural plasticity. Our results suggest that natural behaviors are associated with complex genetic architectures that alter behavior via cis-regulatory differences whose effects on gene expression are specific to the behavior itself.


Sign in / Sign up

Export Citation Format

Share Document