scholarly journals Distinct sequence features underlie microdeletions and gross deletions in the human genome

2021 ◽  
Author(s):  
Mengling Qi ◽  
Peter D. Stenson ◽  
Edward V. Ball ◽  
John A. Tainer ◽  
Albino Bacolla ◽  
...  
2017 ◽  
Author(s):  
Akshay Kakumanu ◽  
Silvia Velasco ◽  
Esteban Mazzoni ◽  
Shaun Mahony

AbstractGenomic loci with regulatory potential can be identified and annotated with various properties. For example, genomic sites may be annotated as being bound by a given transcription factor (TF) in one or more cell types. The same sites may be further labeled as being proximal or distal to known promoters. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between annotation labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, we show SeqUnwinder’s ability to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines.Availabilityhttps://github.com/seqcode/sequnwinder


2018 ◽  
Author(s):  
Peter A Noble ◽  
Alexander E. Pozhitkov

ABSTRACTOur previous study found more than 500 transcripts significantly increased in abundance in the zebrafish and mouse several hours to days postmortem relative to live controls. The current literature suggests that most mRNAs are post-transcriptionally regulated in stressful conditions, we rationalized that the postmortem transcripts must contain sequence features (3 to 9 mers) that are unique from those in the rest of the transcriptome – specifically, binding sites for proteins and/or non-coding RNAs involved in regulation. Our new study identified 5117 and 2245 over-represented sequence features in the mouse and zebrafish, respectively. Some of these features were disproportionately distributed along the transcripts with high densities in the 3-UTR region of the zebrafish (0.3 mers/nt) and the ORFs of the mouse (0.6 mers/nt). Yet, the highest density (2.3 mers/nt) occurred in the ORFs of 11 mouse transcripts that lacked UTRs. Our results suggest that these transcripts might serve as ‘molecular sponges’ that sequester RNA binding proteins and/or microRNAs, increasing the stability and gene expression of other transcripts. In addition, some features were identified as binding sites forRbfoxandHudproteins that are also involved in increasing transcript stability and gene expression. Hence, our results are consistent with the hypothesis that transcripts involved in responding to extreme stress have sequence features that make them different from the rest of the transcriptome, which presumably has implications for post-transcriptional regulation in disease, starvation, and cancer.ABBREVIATIONSUTRuntranslated regionsORFsopen reading framesOPoverabundant transcript poolCPcontrol transcript poolFPfalse positiveRBPRNA binding proteinsncRNAnon-coding RNAmiRNAmicroRNA


2019 ◽  
Vol 63 (6) ◽  
pp. 757-771 ◽  
Author(s):  
Claire Francastel ◽  
Frédérique Magdinier

Abstract Despite the tremendous progress made in recent years in assembling the human genome, tandemly repeated DNA elements remain poorly characterized. These sequences account for the vast majority of methylated sites in the human genome and their methylated state is necessary for this repetitive DNA to function properly and to maintain genome integrity. Furthermore, recent advances highlight the emerging role of these sequences in regulating the functions of the human genome and its variability during evolution, among individuals, or in disease susceptibility. In addition, a number of inherited rare diseases are directly linked to the alteration of some of these repetitive DNA sequences, either through changes in the organization or size of the tandem repeat arrays or through mutations in genes encoding chromatin modifiers involved in the epigenetic regulation of these elements. Although largely overlooked so far in the functional annotation of the human genome, satellite elements play key roles in its architectural and topological organization. This includes functions as boundary elements delimitating functional domains or assembly of repressive nuclear compartments, with local or distal impact on gene expression. Thus, the consideration of satellite repeats organization and their associated epigenetic landmarks, including DNA methylation (DNAme), will become unavoidable in the near future to fully decipher human phenotypes and associated diseases.


Sign in / Sign up

Export Citation Format

Share Document