CpG-island-based annotation and analysis of human housekeeping genes

Author(s):  
Le Zhang ◽  
Zichun Dai ◽  
Jun Yu ◽  
Ming Xiao

Abstract By reviewing previous CpG-related studies, we consider that the transcription regulation of about half of the human genes, mostly housekeeping (HK) genes, involves CpG islands (CGIs), their methylation states, CpG spacing and other chromosomal parameters. However, the precise CGI definition and positioning of CGIs within gene structures, as well as specific CGI-associated regulatory mechanisms, all remain to be explained at individual gene and gene-family levels, together with consideration of species and lineage specificity. Although previous studies have already classified CGIs into high-CpG (HCGI), intermediate-CpG (ICGI) and low-CpG (LCGI) densities based on CpG density variation, the correlation between CGI density and gene expression regulation, such as co-regulation of CGIs and TATA box on HK genes, remains to be elucidated. First, this study introduces such a problem-solving protocol for human-genome annotation, which is based on a combination of GTEx, JBLA and Gene Ontology (GO) analysis. Next, we discuss why CGI-associated genes are most likely regulated by HCGI and tend to be HK genes; the HCGI/TATA± and LCGI/TATA± combinations show different GO enrichment, whereas the ICGI/TATA± combination is less characteristic based on GO enrichment analysis. Finally, we demonstrate that Hadoop MapReduce-based MR-JBLA algorithm is more efficient than the original JBLA in k-mer counting and CGI-associated gene analysis.

2017 ◽  
Author(s):  
Amin Mahpour ◽  
Dominic Smiraglia ◽  
Benjamin S. Scruggs ◽  
Irwin H. Gelman ◽  
Toru Ouchi

AbstractHow TATA-less promoters such as those within CpG islands (CGI) control gene expression is still a subject of active research. Here, we have identified the “CGCG element”, a ten-base pair motif with a consensus sequence of TCTCGCGAGA present in a group of promoter-associated CGIs of ribosomal protein and housekeeping genes. This element is evolutionarily conserved in vertebrates, found in DNase-accessible regions and employs RNA polymerase 2 to activate gene expression. Through extensive analysis of several endogenous promoters, we demonstrate that this element activates bidirectional transcription through divergent start sites. Methylation of this element abrogates the associated promoter activity. When coincident with a TATA-box directional transcription remains CGCG-dependent. Because the CGCG element is sufficient to drive transcription, we propose that its unmethylated form functions as a core promoter of TATA-less CGI-associated promoters.


2020 ◽  
Vol 31 (7-8) ◽  
pp. 240-251
Author(s):  
Saki Aoto ◽  
Mayu Fushimi ◽  
Kei Yura ◽  
Kohji Okamura

Abstract While CpG dinucleotides are significantly reduced compared to other dinucleotides in mammalian genomes, they can congregate and form CpG islands, which localize around the 5ʹ regions of genes, where they function as promoters. CpG-island promoters are generally unmethylated and are often found in housekeeping genes. However, their nucleotide sequences and existence per se are not conserved between humans and mice, which may be due to evolutionary gain and loss of the regulatory regions. In this study, human and rhesus monkey genomes, with moderately conserved sequences, were compared at base resolution. Using transcription start site data, we first validated our methods’ ability to identify orthologous promoters and indicated a limitation using the 5ʹ end of curated gene models, such as NCBI RefSeq, as their transcription start sites. We found that, in addition to deamination mutations, insertions and deletions of bases, repeats, and long fragments contributed to the mutations of CpG dinucleotides. We also observed that the G + C contents tended to change in CpG-poor environments, while CpG content was altered in G + C-rich environments. While loss of CpG islands can be caused by gradual decreases in CpG sites, gain of these islands appear to require two distinct nucleotide altering steps. Taken together, our findings provide novel insights into the process of acquisition and diversification of CpG-island promoters in vertebrates.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 853
Author(s):  
Siti Aisyah Faten Mohamed Sa’dom ◽  
Sweta Raikundalia ◽  
Shaharum Shamsuddin ◽  
Wei Cun See Too ◽  
Ling Ling Few

Choline kinase (CK) is the enzyme catalyzing the first reaction in CDP-choline pathway for the biosynthesis of phosphatidylcholine. Higher expression of the α isozyme of CK has been implicated in carcinogenesis, and inhibition or downregulation of CKα (CHKA) is a promising anticancer approach. This study aimed to investigate the regulation of CKα expression by DNA methylation of the CpG islands found on the promoter of this gene in MCF-7 cells. Four CpG islands have been predicted in the 2000 bp promoter region of ckα (chka) gene. Six CpG island deletion mutants were constructed using PCR site-directed mutagenesis method and cloned into pGL4.10 vectors for promoter activity assays. Deletion of CpG4C region located between –225 and –56 significantly increased the promoter activity by 4-fold, indicating the presence of important repressive transcription factor binding site. The promoter activity of methylated full-length promoter was significantly lower than the methylated CpG4C deletion mutant by 16-fold. The results show that DNA methylation of CpG4C promotes the binding of the transcription factor that suppresses the promoter activity. Electrophoretic mobility shift assay analysis showed that cytosine methylation at MZF1 binding site in CpG4C increased the binding of putative MZF1 in nuclear extract. In conclusion, the results suggest that DNA methylation decreased the promoter activity by promoting the binding of putative MZF1 transcription factor at CpG4C region of the ckα gene promoter.


Genome ◽  
2021 ◽  
pp. 1-11
Author(s):  
Qi Chen ◽  
Wei Wang ◽  
Sameer Khanal ◽  
Jinlei Han ◽  
Mi Zhang ◽  
...  

Cotton (Gossypium L.) is the most important fiber crop worldwide. Here, transcriptome analysis was conducted on developing fibers of a G. mustelinum introgression line, IL9, and its recurrent parent, PD94042, at 17 and 21 days post-anthesis (dpa). Differentially expressed genes (DEGs) of PD94042 and IL9 were identified. Gene Ontology (GO) enrichment analysis showed that the annotated DEGs were rich in two main biological processes and two main molecular functions. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis likewise showed that the annotated DEGs were mainly enriched in metabolic pathways and biosynthesis of secondary metabolites. In total, 52 DEGs were selected as candidate genes based on comparison of the DEGs and GO function annotation information. Quantitative real-time PCR (RT-qPCR) analysis results for 12 randomly selected DEGs were consistent with transcriptome analysis. SNP identification based on G. mustelinum chromatin segment introgression showed that 394 SNPs were identified in 268 DEGs, and two genes with known functions were identified within fiber strength quantitative trait loci (QTL) regions or near the confidence intervals. We identified 52 key genes potentially related to high fiber strength in a G. mustelinum introgression line and provided significant insights into the study of cotton fiber quality improvement.


1993 ◽  
Vol 13 (9) ◽  
pp. 5538-5548
Author(s):  
Y C Choi ◽  
C B Chae

In contrast to many other genes containing a CpG island, the testis-specific H2B (TH2B) histone gene exhibits tissue-specific methylation patterns in correlation with gene activity. Characterization of the methylation patterns within a 20-kb segment containing the TH2A and TH2B genes in comparison with that in a somatic histone cluster revealed that: (i) the germ cell-specific unmethylated domain of the TH2A and TH2B genes is defined as a small region surrounding the CpG islands of the TH2A and TH2B genes and (ii) somatic histone genes are unmethylated in both liver and germ cells, like other genes containing CpG islands, whereas flanking sequences are methylated. Transfection of in vitro-methylated TH2B, somatic H2B, and mouse metallothionein I constructs into F9 embryonal carcinoma cells revealed that the CpG islands of the TH2A and TH2B genes were demethylated like those of the somatic H2A and H2B genes and the metallothionein I gene. The demethylation of those CpG islands became significantly inefficient at a high number of integrated copies and a high density of methylated CpG dinucleotides. In contrast, three sites in the somatic histone cluster, of which two sites are located in the long terminal repeat of an endogenous retrovirus-like sequence, were efficiently demethylated even at a high copy number and a high density of methylated CpG dinucleotides. These results suggest two possible mechanisms for demethylation in F9 cells and methylation of CpG islands of the TH2A and TH2B genes at the postblastula stage during embryogenesis.


2021 ◽  
Author(s):  
Hang Zhang ◽  
Wenhan Zhou ◽  
Xiaoyi Yang ◽  
Shuzhan Wen ◽  
Baicheng Zhao ◽  
...  

Abstract Background PTEN is a multifunctional tumor suppressor gene mutating at high frequency in a variety of cancers. However, its expression in pan-cancer, correlated genes, survival prognosis, and regulatory pathways are not completely described. Here, we aimed to conduct a comprehensive analysis from the above perspectives in order to provide reference for clinical application. Methods we studied the expression levels in cancers by using data from TCGA and GTEx database. Obtain expression box plot from UALCAN database. Perform mutation analysis on the cBioportal website. Obtain correlation genes on the GEPIA website. Construct protein network and perform KEGG and GO enrichment analysis on the STRING database. Perform prognostic analysis on the Kaplan-Meier Plotter website. We also performed transcription factor prediction on the PROMO database and performed RNA-RNA association and RNA-protein interaction on the RNAup Web server and RPISEq. The gene 3D structure, protein sequence and conserved domain were obtained in NCBI respectively. Results PTEN was underexpressed in all cancers we studied. It was closely related to the clinical stage of tumors, suggesting PTEN may involved in cancer development and progression. The mutations of PTEN were present in a variety of cancers, most of which were truncation mutations and missense mutations. Among cancers (KIRC, LUAD, THYM, UCEC, Gastric Cancer, Liver Cancer, Lung Cancer, Breast Cancer), patients with low expression of PTEN had a shorter OS time and poorer OS prognosis. The low expression of PTEN can cause the deterioration of RFS in certain cancers (TGCT, UCEC, LIHC, LUAD, THCA), suggesting that the expression of PTEN was related to the clinical prognosis. Our study identified genes correlated with PTEN and performed GO enrichment analysis on 100 PTEN-related genes obtained from the GEPIA website. Conclusions The understanding of PTEN gene and the in-depth exploration of its related regulatory pathways may provide insight for the discovery of tumor-specific biomarkers and clinical potential therapeutic targets.


2021 ◽  
Author(s):  
Jiaheng Xie ◽  
Yuan Cao ◽  
Zhechen Zhu ◽  
Shujie Ruan ◽  
Ming Wang ◽  
...  

Abstract Backgrounds Melanoma is a highly aggressive kind of cancer with very poor prognosis. B-raf inhibitor vemurafenib has indeed harvested substantial clinical benefits. Nevertheless, its drug resistance has also hampered scientists effort towards successful melanoma treatment. In this study, we used data derived from the GEO database to analyze the effect on vemurafenib sensitive cell lines after vemurafenib treatment. GEO datasets GSE42872 (cohort1), GSE127988 (cohort2), GSE110054 (cohort3) were included in the analysis. Results We found 25 common Differentially Expressed Genes(DEGs) in 3 datasets, including 10 upregulated genes and 15 downregulated genes after vemurefenib application. Analysis using web tool TIMER showed significant correlation of the upregulated genes with immune infiltration level in skin cell melanoma. GO enrichment analysis showed that after vemurafenib treatment, all datasets showed downregulation in DNA replication and cell cycle arrest. Meanwhile, genes related to neuro-generation, extracellular matrix and cell-cell adhesion were significantly enriched in all three datasets. KEGG analysis showed that pathways like P53, PI3K-Akt, and Rap signaling pathways were enriched in DEGs after vemurafenib administration.Conlusions The findings of the candidate DEGs and pathways may not only reveal the cellular sensitivity to vemurafenib treatment, but also give rise to a better understanding of the mechanism of cancer cell cycle arrest and cellular resistance towards vemurafenib targeted therapy.


1989 ◽  
Vol 9 (6) ◽  
pp. 2322-2331
Author(s):  
A H Beggs ◽  
B R Migeon

Part of the higher-order structure of chromatin is achieved by constraining DNA in loops ranging in size from 30 to 100 kilobase pairs; these loops have been implicated in defining functional domains and replicons and possibly in facilitating transcription. Because the human active and inactive X chromosomes differ in transcriptional activity and replication, we looked for differences in their chromatin loop structures. Since the islands of CpG-rich DNA at the 5' ends of X-linked housekeeping genes are the regions where functional differences in DNA methylation and nuclease sensitivity are found, we looked for scaffold association of these sequences after extraction of histones with lithium diiodosalicylate. Specifically, we examined the 5' CpG islands within the hypoxanthine phosphoribosyltransferase, glucose 6-phosphate dehydrogenase, P3, GdX, phosphoglycerate kinase type 1, and alpha-galactosidase loci in human lymphoblasts obtained from individuals with 1 to 4 X chromosomes. Although we detected no scaffold-associated regions near these genes, we found several such regions at the ornithine transcarbamylase and blood clotting factor IX loci. Our results suggest that the CpG islands are excluded from the nuclear scaffold and that even though transcriptionally active, housekeeping genes are less likely than X-linked tissue-specific genes to be scaffold associated. In all cases, the pattern of scaffold association was the same for loci on active and inactive X chromosomes.


2020 ◽  
Author(s):  
Vijayakrishna Kolur ◽  
Basavaraj Vastrad ◽  
Chanabasayya Vastrad ◽  
Iranna Kotturshetti ◽  
Anandkumar Tengli

Abstract BackgroundCoronary artery disease (CAD) is one of the most common disorders in the cardiovascular system. This study aims to explore potential signaling pathways and important biomarkers that drive CAD development. MethodsThe CAD GEO Dataset GSE113079 was featured to screen differentially expressed genes (DEGs). The pathway and Gene Ontology (GO) enrichment analysis of DEGs were analyzed using the ToppGene. We screened hub and target genes from protein-protein interaction (PPI) networks, target gene - miRNA regulatory network and target gene - TF regulatory network, and Cytoscape software. Validations of hub genes were performed to evaluate their potential prognostic and diagnostic value for CAD. Results1,036 DEGs were captured according to screening criteria (525upregulated genes and 511downregulated genes). Pathway and Gene Ontology (GO) enrichment analysis of DEGs revealed that these up and down regulated genes are mainly enriched in thyronamine and iodothyronamine metabolism, cytokine-cytokine receptor interaction, nervous system process, cell cycle and nuclear membrane. Hub genes were validated to find out potential prognostic biomarkers, diagnostic biomarkers and novel therapeutic target for CAD. ConclusionsIn summary, our findings discovered pivotal gene expression signatures and signaling pathways in the progression of CAD. CAPN13, ACTBL2, ERBB3, GATA4, GNB4, NOTCH2, EXOSC10, RNF2, PSMA1 and PRKAA1 might contribute to the progression of CAD, which could have potential as biomarkers or therapeutic targets for CAD.


Sign in / Sign up

Export Citation Format

Share Document