Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes

Mapping Intimacies ◽

10.1101/723346 ◽

2019 ◽

Author(s):

Ibrahim Sultan ◽

Vincent Fromion ◽

Sophie Schbath ◽

Pierre Nicolas

Keyword(s):

Listeria Monocytogenes ◽

Dna Sequences ◽

Motif Discovery ◽

De Novo ◽

Expression Profiles ◽

Monte Carlo Algorithm ◽

Transcriptome Data ◽

Transcription Start ◽

Data Set ◽

Transcription Start Sites

AbstractAutomatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model of promoter DNA sequences that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. Two main novelties are to allow overlaps between motif occurrences and to incorporate covariates summarising expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). All parameters are estimated using a dedicated trans-dimensional Markov chain Monte Carlo algorithm that adjusts, simultaneously, for many motifs and many expression covariates: the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe position with respect to the transcription start site, and the choice of relevant expression covariates. A data-set of transcription start sites and expression profiles available for the Listeria monocytogenes is analysed. The results validate the approach and provide a new global view of the transcription regulatory network of this important model food-borne pathogen. A previously unreported motif that may play an important role in the regulation of growth was found in promoter regions of ribosomal protein genes.

Download Full-text

Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes

Journal of The Royal Society Interface ◽

10.1098/rsif.2020.0600 ◽

2020 ◽

Vol 17 (171) ◽

pp. 20200600

Author(s):

Ibrahim Sultan ◽

Vincent Fromion ◽

Sophie Schbath ◽

Pierre Nicolas

Keyword(s):

Listeria Monocytogenes ◽

Dna Sequences ◽

Motif Discovery ◽

De Novo ◽

Expression Profiles ◽

Monte Carlo Algorithm ◽

Transcriptome Data ◽

Ribosomal Protein Genes ◽

Transcription Start ◽

Transcription Start Sites

Automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. The central idea of this model is to improve the probabilistic representation of the promoter DNA sequences by incorporating covariates summarizing expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). A dedicated trans-dimensional Markov chain Monte Carlo algorithm adjusts the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe exact position relative to the transcription start site, and chooses the expression covariates relevant for each motif. All parameters are estimated simultaneously, for many motifs and many expression covariates. The method is applied to a dataset of transcription start sites and expression profiles available for Listeria monocytogenes . The results validate the approach and provide a new global view of the transcription regulatory network of this important pathogen. Remarkably, a previously unreported motif is found in promoter regions of ribosomal protein genes, suggesting a role in the regulation of growth.

Download Full-text

MINDFUL: A Method to Identify Novel and Diverse Signals with Fast, Unsupervised Learning

10.1101/805820 ◽

2019 ◽

Author(s):

Mallika Parulekar ◽

Leelavati Narlikar

Keyword(s):

Motif Discovery ◽

De Novo ◽

Cpg Islands ◽

Categorical Variables ◽

Model Parameters ◽

Fast Method ◽

Transcription Start ◽

Transcription Start Sites ◽

Optimal Value ◽

Small Set

AbstractWith rapid advances in experimental methods that map transcription start sites (TSSs) at a high resolution, there is a need to characterize the sequence diversity of TSS neighborhoods. Most current techniques scan for previously discovered elements, such as the TATA box, the INR motif, CpG islands, etc. to categorize promoters into different classes. Reliance on such elements hinders the discovery of novel elements. On the other hand, methods that use standard motif discovery to discover de novo promoter elements are also limited by the fact that a motif is picked up only if it is over-represented in the dataset. An element that appears only in a small set of promoters can thus be missed. We previously developed a clustering-based approach that uses no prior knowledge of elements to solve this problem [1]. That method uses Gibbs sampling to learn the model parameters, but is untenable on large datasets. Here we propose a new, fast method called MINDFUL, that uses a greedy k-means-like approach to cluster promoters aligned by TSSs into diverse classes, while also learning the optimal value of k. It is general enough to be used for any data that has categorical variables, and is not restricted to DNA.

Download Full-text

SAT-446 Transcriptome Comparison of a Natural T3 Regulated Process and a Treatment with T3

Journal of the Endocrine Society ◽

10.1210/jendso/bvaa046.512 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

Author(s):

Nicolas Buisine ◽

Muriel Rigolet ◽

Evelyne Duvernois-Berthet, Master ◽

Laurent M Sachs

Keyword(s):

Gene Expression ◽

Binding Sites ◽

Target Genes ◽

De Novo ◽

Interaction Analysis ◽

Expression Profiles ◽

Differentially Expressed ◽

Chromatin Interaction ◽

Transcription Start ◽

Transcription Start Sites

Abstract Thyroid hormones (TH) act mainly on the expression of the genome. It is well accepted that differential gene expression following treatment with TH recapitulates the expression profiles taking place during natural processes induced by these hormones. Although both processes seem to correlate at the scale of a few target genes, this hasn’t been addressed in a systematic manner. Our objectives were first to compare transcriptome variations after TH treatment with transcriptome variations during a TH controlled natural process and second to evaluate the proportion of the direct TH-response. The measurement of gene expression at genome scale (transcriptome) is obtained by sequencing all the messenger RNAs (RNA-seq). Direct response was sorted out by linking the transcription start sites of target genes using RNA-PET analysis with TR binding sites mapping using chromatin interaction analysis by paired-end tag sequencing (ChIA-PET). Indeed, ChIA-PET not only allow to map TR binding sites but also the physical interactions between them and transcription start sites of regulated genes. Our model is one of the striking developmental processes orchestrated by TH: amphibian metamorphosis. Tadpole transformation is marked by dramatic changes including de novo morphogenesis (limb), tissue remodelling (brain, intestine...) and organ resorption through apoptosis (tail). These changes involve cascades of gene regulation initiated by TH and their receptors. Because metamorphosis has close and interesting parallels with the perinatal period in mammals (including human), metamorphosis is thus an attractive model to analyze in a physiological context, the functions and mechanisms of action of TH. Here, we have focused on one Xenopus tropicalis organs, the tail fin skin which will disappear through cell death. We have compared natural metamorphosis with 24h of 10nM triiodothyronine (T3) treatment. We were able to observe several differences between T3 treatment and natural development. First, the genes regulated by T3 only correspond to a proportion of genes differentially expressed during metamorphosis. Second, T3-response genes start to be regulated well before tail regression (several days). Finally, T3-direct target genes represent a few percent of all the genes differentially expressed during tail regression. In conclusion, the comparison of transcriptomes of natural and induced metamorphosis allow us to reach a more precise understanding of TH action

Download Full-text

A Clustering-Based Algorithm for De Novo Motif Discovery in DNA Sequences

2017 24th National and 2nd International Iranian Conference on Biomedical Engineering (ICBME) ◽

10.1109/icbme.2017.8430242 ◽

2017 ◽

Author(s):

Mohammad Haghir Ebrahim-Abadi ◽

Emad Fatemizadeh

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

De Novo ◽

De Novo Motif Discovery

Download Full-text

A Clustering Approach for Motif Discovery in ChIP-Seq Dataset

Entropy ◽

10.3390/e21080802 ◽

2019 ◽

Vol 21 (8) ◽

pp. 802

Author(s):

Chun-xiao Sun ◽

Yu Yang ◽

Hua Wang ◽

Wen-hu Wang

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

Simulated Data ◽

Data Set ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Clustering Approach ◽

Ap Clustering ◽

Generation Sequencing

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.

Download Full-text

Comprehensive Clinical and Molecular Characterization of KRASG12C-Mutant Colorectal Cancer

JCO Precision Oncology ◽

10.1200/po.20.00256 ◽

2021 ◽

pp. 613-621

Author(s):

Jason T. Henry ◽

Oluwadara Coker ◽

Saikat Chowdhury ◽

John Paul Shen ◽

Van K. Morris ◽

...

Keyword(s):

Colorectal Cancer ◽

Clinical Trials ◽

De Novo ◽

Expression Profiles ◽

Single Gene ◽

Gene Expression Profiles ◽

Progression Free Survival ◽

Cancer Center ◽

Data Set ◽

External Data

PURPOSE KRAS p.G12C mutations occur in approximately 3% of metastatic colorectal cancers (mCRC). Recently, two allosteric inhibitors of KRAS p.G12C have demonstrated activity in early phase clinical trials. There are no robust studies examining the behavior of this newly targetable population. METHODS We queried the MD Anderson Cancer Center data set for patients with colorectal cancer who harbored KRAS p.G12C mutations between January 2003 and September 2019. Patients were analyzed for clinical characteristics, overall survival (OS), and progression-free survival (PFS) and compared against KRAS nonG12C. Next, we analyzed several internal and external data sets to assess immune signatures, gene expression profiles, hypermethylation, co-occurring mutations, and proteomics. RESULTS Among the 4,632 patients with comprehensive molecular profiling, 134 (2.9%) were found to have KRAS p.G12C mutations. An additional 53 patients with single gene sequencing were included in clinical data but excluded from prevalence analysis allowing for 187 total patients. Sixty-five patients had de novo metastatic disease and received a median of two lines of chemotherapy without surgical intervention. For the first three lines of chemotherapy, the median PFS was 6.4 months (n = 65; 95% CI, 5.0 to 7.4 months), 3.9 months (n = 47; 95% CI, 2.9 to 5.9 months), and 3.0 months (n = 21; 95% CI, 2.0 to 3.4 months), respectively. KRAS p.G12C demonstrated higher rates of basal EGFR activation compared with KRAS nonG12C. When compared with an internal cohort of KRAS nonG12C, KRAS p.G12C patients had worse OS. CONCLUSION PFS is poor for patients with KRAS p.G12C metastatic colorectal cancer. OS was worse in KRAS p.G12C compared with KRAS nonG12C patients. Our data highlight the innate resistance to chemotherapy for KRAS p.G12C patients and serve as a historical comparator for future clinical trials.

Download Full-text

DNA Microarrays of the Complex Human Cytomegalovirus Genome: Profiling Kinetic Class with Drug Sensitivity of Viral Gene Expression

Journal of Virology ◽

10.1128/jvi.73.7.5757-5766.1999 ◽

1999 ◽

Vol 73 (7) ◽

pp. 5757-5766 ◽

Cited By ~ 184

Author(s):

James Chambers ◽

Ana Angulo ◽

Dhammika Amaratunga ◽

Hongqing Guo ◽

Ying Jiang ◽

...

Keyword(s):

Gene Expression ◽

Human Cytomegalovirus ◽

Dna Sequences ◽

De Novo ◽

Expression Profiles ◽

Viral Gene Expression ◽

Viral Gene ◽

Viral Dna ◽

Viral Protein Synthesis ◽

Entire Genome

ABSTRACT We describe, for the first time, the generation of a viral DNA chip for simultaneous expression measurements of nearly all known open reading frames (ORFs) in the largest member of the herpesvirus family, human cytomegalovirus (HCMV). In this study, an HCMV chip was fabricated and used to characterize the temporal class of viral gene expression. The viral chip is composed of microarrays of viral DNA prepared by robotic deposition of oligonucleotides on glass for ORFs in the HCMV genome. Viral gene expression was monitored by hybridization to the oligonucleotide microarrays with fluorescently labelled cDNAs prepared from mock-infected or infected human foreskin fibroblast cells. By using cycloheximide and ganciclovir to block de novo viral protein synthesis and viral DNA replication, respectively, the kinetic classes of array elements were classified. The expression profiles of known ORFs and many previously uncharacterized ORFs provided a temporal map of immediate-early (α), early (β), early-late (γ1), and late (γ2) genes in the entire genome of HCMV. Sequence compositional analysis of the 5′ noncoding DNA sequences of the temporal classes, performed by using algorithms that automatically search for defined and recurring motifs in unaligned sequences, indicated the presence of potential regulatory motifs for β, γ1, and γ2 genes. In summary, these fabricated microarrays of viral DNA allow rapid and parallel analysis of gene expression at the whole viral genome level. The viral chip approach coupled with global biochemical and genetic strategies should greatly speed the functional analysis of established as well as newly discovered large viral genomes.

Download Full-text

Global analysis of transcription start sites in the new ovine reference genome (Oar rambouillet v1.0)

10.1101/2020.07.06.189480 ◽

2020 ◽

Cited By ~ 2

Author(s):

Mazdak Salavati ◽

Alex Caulton ◽

Richard Clark ◽

Iveta Gazova ◽

Timothy P. L. Smith ◽

...

Keyword(s):

Reference Genome ◽

Expression Profiles ◽

Global Analysis ◽

Integrative Genomics ◽

Sequencing Data ◽

Transcription Start ◽

Promoter Regions ◽

Tissue Samples ◽

Transcription Start Sites ◽

Transcript Regulation

AbstractThe overall aim of the Ovine FAANG project is to provide a comprehensive annotation of the new highly contiguous sheep reference genome sequence (Oar rambouillet v1.0). Mapping of transcription start sites (TSS) is a key first step in understanding transcript regulation and diversity. Using 56 tissue samples collected from the reference ewe Benz2616 we have performed a global analysis of TSS and TSS- Enhancer clusters using Cap Analysis Gene Expression (CAGE) sequencing. CAGE measures RNA expression by 5’ cap-trapping and has been specifically designed to allow the characterization of TSS within promoters to single-nucleotide resolution. We have adapted an analysis pipeline that uses TagDust2 for clean-up and trimming, Bowtie2 for mapping, CAGEfightR for clustering and the Integrative Genomics Viewer (IGV) for visualization. Mapping of CAGE tags indicated that the expression levels of CAGE tag clusters varied across tissues. Expression profiles across tissues were validated using corresponding polyA+ mRNA-Seq data from the same samples. After removal of CAGE tags with < 10 read counts, 39.3% of TSS overlapped with 5’ ends of 31,113 transcripts that had been previously annotated by NCBI (out of a total of 56,308 from the NCBI annotation). For 25,195 of the transcripts, previously annotated by NCBI, no TSS meeting stringent criteria were identified. A further 14.7% of TSS mapped to within 50bp of annotated promoter regions. Intersecting these predicted TSS regions with annotated promoter regions (±50bp) revealed 46% of the predicted TSS were ‘novel’ and previously un-annotated. Using whole genome bisulphite sequencing data from the same tissues we were able to determine that a proportion of these ‘novel’ TSS were hypo-methylated (32.2%) indicating that they are likely to be reproducible rather than ‘noise’. This global analysis of TSS in sheep will significantly enhance the annotation of gene models in the new ovine reference assembly. Our analyses provide one of the highest resolution annotations of transcript regulation and diversity in a livestock species to date.

Download Full-text

refTSS: A Reference Data Set for Human and Mouse Transcription Start Sites

Journal of Molecular Biology ◽

10.1016/j.jmb.2019.04.045 ◽

2019 ◽

Vol 431 (13) ◽

pp. 2407-2422 ◽

Cited By ~ 10

Author(s):

Imad Abugessaisa ◽

Shuhei Noguchi ◽

Akira Hasegawa ◽

Atsushi Kondo ◽

Hideya Kawaji ◽

...

Keyword(s):

Reference Data ◽

Transcription Start ◽

Data Set ◽

Transcription Start Sites ◽

Human And Mouse

Download Full-text

PairMotif+: A Fast and Effective Algorithm for De Novo Motif Discovery in DNA sequences

International Journal of Biological Sciences ◽

10.7150/ijbs.5786 ◽

2013 ◽

Vol 9 (4) ◽

pp. 412-424 ◽

Cited By ~ 6

Author(s):

Qiang Yu ◽

Hongwei Huo ◽

Yipu Zhang ◽

Hongzhi Guo ◽

Haitao Guo

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

De Novo ◽

Effective Algorithm ◽

De Novo Motif Discovery

Download Full-text