CUBAP: an interactive web portal for analyzing codon usage biases across populations

Matthew W Hodgman; Justin B Miller; Taylor E Meurs; John S K Kauwe

doi:10.1093/nar/gkaa863

CUBAP: an interactive web portal for analyzing codon usage biases across populations

Nucleic Acids Research ◽

10.1093/nar/gkaa863 ◽

2020 ◽

Vol 48 (19) ◽

pp. 11030-11039

Author(s):

Matthew W Hodgman ◽

Justin B Miller ◽

Taylor E Meurs ◽

John S K Kauwe

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Association Studies ◽

East Asian ◽

Nucleotide Composition ◽

Synonymous Codon Usage ◽

Genome Wide Association Studies ◽

Genome Wide ◽

African Populations ◽

Place Of Origin

Abstract Synonymous codon usage significantly impacts translational and transcriptional efficiency, gene expression, the secondary structure of both mRNA and proteins, and has been implicated in various diseases. However, population-specific differences in codon usage biases remain largely unexplored. Here, we present a web server, https://cubap.byu.edu, to facilitate analyses of codon usage biases across populations (CUBAP). Using the 1000 Genomes Project, we calculated and visually depict population-specific differences in codon frequencies, codon aversion, identical codon pairing, co-tRNA codon pairing, ramp sequences, and nucleotide composition in 17,634 genes. We found that codon pairing significantly differs between populations in 35.8% of genes, allowing us to successfully predict the place of origin for African and East Asian individuals with 98.8% and 100% accuracy, respectively. We also used CUBAP to identify a significant bias toward decreased CTG pairing in the immunity related GTPase M (IRGM) gene in East Asian and African populations, which may contribute to the decreased association of rs10065172 with Crohn's disease in those populations. CUBAP facilitates in-depth gene-specific and codon-specific visualization that will aid in analyzing candidate genes identified in genome-wide association studies, identifying functional implications of synonymous variants, predicting population-specific impacts of synonymous variants and categorizing genetic biases unique to certain populations.

Download Full-text

Codon Usage Bias Covaries With Expression Breadth and the Rate of Synonymous Evolution in Humans, but This Is Not Evidence for Selection

Genetics ◽

10.1093/genetics/159.3.1191 ◽

2001 ◽

Vol 159 (3) ◽

pp. 1191-1199

Author(s):

Araxi O Urrutia ◽

Laurence D Hurst

Keyword(s):

Codon Usage ◽

Codon Bias ◽

Synonymous Codon ◽

Nucleotide Composition ◽

Synonymous Codon Usage ◽

Synonymous Substitutions ◽

Numerous Species ◽

Nucleotide Content ◽

Expression Breadth ◽

Human Genes

Abstract In numerous species, from bacteria to Drosophila, evidence suggests that selection acts even on synonymous codon usage: codon bias is greater in more abundantly expressed genes, the rate of synonymous evolution is lower in genes with greater codon bias, and there is consistency between genes in the same species in which codons are preferred. In contrast, in mammals, while nonequal use of alternative codons is observed, the bias is attributed to the background variance in nucleotide concentrations, reflected in the similar nucleotide composition of flanking noncoding and exonic third sites. However, a systematic examination of the covariants of codon usage controlling for background nucleotide content has yet to be performed. Here we present a new method to measure codon bias that corrects for background nucleotide content and apply this to 2396 human genes. Nearly all (99%) exhibit a higher amount of codon bias than expected by chance. The patterns associated with selectively driven codon bias are weakly recovered: Broadly expressed genes have a higher level of bias than do tissue-specific genes, the bias is higher for genes with lower rates of synonymous substitutions, and certain codons are repeatedly preferred. However, while these patterns are suggestive, the first two patterns appear to be methodological artifacts. The last pattern reflects in part biases in usage of nucleotide pairs. We conclude that we find no evidence for selection on codon usage in humans.

Download Full-text

Increasing Sample Diversity in Psychiatric Genetics – Introducing a new Cohort of Patients with Schizophrenia and Controls from Vietnam – Results from a Pilot Study

10.1101/2021.04.21.21255615 ◽

2021 ◽

Author(s):

VT Nguyen ◽

A Braun ◽

J Kraft ◽

TMT Ta ◽

GM Panagiotaropoulou ◽

...

Keyword(s):

Pilot Study ◽

Data Collection ◽

Predictive Power ◽

Association Studies ◽

East Asian ◽

Genetic Research ◽

European Ancestry ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractObjectivesGenome-Wide Association Studies (GWAS) of Schizophrenia (SCZ) have provided new biological insights; however, most cohorts are of European ancestry. As a result, derived polygenic risk scores (PRS) show decreased predictive power when applied to populations of different ancestries. We aimed to assess the feasibility of a large-scale data collection in Hanoi, Vietnam, contribute to international efforts to diversify ancestry in SCZ genetic research and examine the transferability of SCZ-PRS to individuals of Vietnamese Kinh ancestry.MethodsIn a pilot study, 368 individuals (including 190 SCZ cases) were recruited at the Hanoi Medical University’s associated psychiatric hospitals and outpatient facilities. Data collection included sociodemographic data, baseline clinical data, clinical interviews assessing symptom severity and genome-wide SNP genotyping. SCZ-PRS were generated using different training data sets: i) European, ii) East-Asian and iii) trans-ancestry GWAS summary statistics from the latest SCZ GWAS meta-analysis.ResultsSCZ-PRS significantly predicted case status in Vietnamese individuals using mixed-ancestry (R2 liability=4.9%, p=6.83*10−8), East-Asian (R2 liability=4.5%, p=2.73*10−7) and European (R2 liability=3.8%, p = 1.79*10−6) discovery samples.DiscussionOur results corroborate previous findings of reduced PRS predictive power across populations, highlighting the importance of ancestral diversity in GWA studies.

Download Full-text

Genome-Wide Association Study of Renal Function Traits: Results from the Japan Multi-Institutional Collaborative Cohort Study

American Journal of Nephrology ◽

10.1159/000488946 ◽

2018 ◽

Vol 47 (5) ◽

pp. 304-316 ◽

Cited By ~ 5

Author(s):

Asahi Hishida ◽

Masahiro Nakatochi ◽

Masato Akiyama ◽

Yoichiro Kamatani ◽

Takeshi Nishiyama ◽

...

Keyword(s):

Renal Function ◽

Japanese Population ◽

Gene Locus ◽

Association Studies ◽

East Asian ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genetic Loci ◽

Asian Populations ◽

Genome Wide

Background: Chronic kidney disease (CKD) is a rapidly growing, worldwide public health problem. Recent advances in genome-wide-association studies (GWAS) revealed several genetic loci associated with renal function traits worldwide. Methods: We investigated the association of genetic factors with the levels of serum creatinine (SCr) and the estimated glomerular filtration rate (eGFR) in Japanese population-based cohorts analyzing the GWAS imputed data with 11,221 subjects and 12,617,569 variants, and replicated the findings with the 148,829 hospital-based Japanese subjects. Results: In the discovery phase, 28 variants within 4 loci (chromosome [chr] 2 with 8 variants including rs3770636 in the LDL receptor related protein 2 gene locus, on chr 5 with 2 variants including rs270184, chr 17 with 15 variants including rs3785837 in the BCAS3 gene locus, and chr 18 with 3 variants including rs74183647 in the nuclear factor of activated T-cells 1 gene locus) reached the suggestive level of p < 1 × 10–6 in association with eGFR and SCr, and 2 variants on chr 4 (including rs78351985 in the microsomal triglyceride transfer protein gene locus) fulfilled the suggestive level in association with the risk of CKD. In the replication phase, 25 variants within 3 loci (chr 2 with 7 variants, chr 17 with 15 variants and chr 18 with 3 variants) in association with eGFR and SCr, and 2 variants on chr 4 associated with the risk of CKD became nominally statistically significant after Bonferroni correction, among which 15 variants on chr 17 and 3 variants on chr 18 reached genome-wide significance of p < 5 × 10–8 in the combined study meta-analysis. The associations of the loci on chr 2 and 18 with eGFR and SCr as well as that on chr 4 with CKD risk have not been previously reported in the Japanese and East Asian populations. Conclusion: Although the present GWAS of renal function traits included the largest sample of Japanese participants to date, we did not identify novel loci for renal traits. However, we identified the novel associations of the genetic loci on chr 2, 4, and 18 with renal function traits in the Japanese population, suggesting these are transethnic loci. Further investigations of these associations are expected to further validate our findings for the potential establishment of personalized prevention of renal disease in the Japanese and East Asian populations.

Download Full-text

Identification of two genes as novel susceptibility loci for type 2 diabetes mellitus in Japanese

European Heart Journal ◽

10.1093/ehjci/ehaa946.2813 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

M Oguri ◽

K Kato ◽

H Horibe ◽

T Fujimaki ◽

J Sakuma ◽

...

Keyword(s):

Diabetes Mellitus ◽

Association Studies ◽

East Asian ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Susceptibility Loci ◽

Asian Populations ◽

Genome Wide ◽

Minor Alleles

Abstract Background The heritability of Type 2 diabetes mellitus (T2DM) has been estimated to be 50% to 60%. Although genome-wide association studies identified >120 loci that confer susceptibility to T2DM, these studies were commonly conducted in a cross-sectional manner. Purpose The purpose of the study was to identify genetic variants that confer susceptibility to T2DM in Japanese. We have now performed longitudinal exome-wide association studies (EWASs) to identify novel loci for T2DM by examining temporal changes in fasting plasma glucose (FPG) level, blood hemoglobin A1c (HbA1c) content, and the prevalence of T2DM. Methods Longitudinal EWASs (mean follow-up period, 5 years) were performed with Illumina Human Exome-12 v1.2 DNA Analysis BeadChip or Infinium Exome-24 v1.0 BeadChip arrays and with 6,022 Japanese (755 subjects with T2DM, 5267 controls). The relation of genotypes of 24,579 SNPs that passed quality control to FPG level, blood HbA1c content, or the prevalence of T2DM was examined with the generalized estimating equation (GEE). To compensate for multiple comparisons of genotypes with each of the three parameters, we applied Bonferroni's correction for statistical significance of association. Results Longitudinal EWASs (GEE with adjustment for age, sex, body mass index, and smoking) revealed that rs6414624 of EVC (P<2.0×10–16 for T2DM, P=9.1×10–11 for FPG), rs78338345 of GGA3 (P<2.0×10–16 for T2DM, P=4.3×10–9 for FPG), rs10490775 of PTPRG (P<2.0×10–16 for T2DM, P=3.3×10–7 for FPG), and rs61739510 of GLT6D1 (P<2.0×10–16 for T2DM, P=5.8×10–7 for FPG) were significantly associated with the prevalence of T2DM and FPG levels; and rs11558471 in SLC30A8 with FPG level (P=1.8×10–8) and blood HbA1c content (P=1.2×10–7). After examination of the relation of identified SNPs to FPG level and blood HbA1c content, linkage disequilibrium of the SNPs, and results of the previous genome-wide association studies, we identified rs6414624 of EVC and rs78338345 of GGA3 as novel susceptibility loci for T2DM. In the identified SNPs (rs6414624 and rs7833834), FPG level, blood HbA1c content, and the prevalence of T2DM were significantly lower in homozygotes with the minor alleles than in homozygotes with the major alleles or heterozygotes. These results suggest that the minor alleles of rs6414624 and rs78338345 are protective against T2DM in Japanese. According to allele frequency data from the 1000 Genomes Project database, the minor G allele of rs78338345 of GGA3 is specifically distributed in East Asia. This suggests that the minor allele frequency may have increased in East Asian populations after the split of East Asian and non-East Asian populations. Conclusion We have newly identified EVC and GGA3 as susceptibility loci for T2DM in Japanese. Determination of genotypes for these SNPs at these loci may prove informative for assessment of the genetic risk for T2DM in Japanese. Funding Acknowledgement Type of funding source: None

Download Full-text

Genome-wide association studies identify susceptibility loci for epithelial ovarian cancer in east Asian women

Gynecologic Oncology ◽

10.1016/j.ygyno.2019.02.023 ◽

2019 ◽

Vol 153 (2) ◽

pp. 343-355 ◽

Cited By ~ 9

Author(s):

Kate Lawrenson ◽

Fengju Song ◽

Dennis J. Hazelett ◽

Siddhartha P. Kar ◽

Jonathan Tyrer ◽

...

Keyword(s):

Ovarian Cancer ◽

Epithelial Ovarian Cancer ◽

Association Studies ◽

East Asian ◽

Genome Wide Association ◽

Asian Women ◽

Genome Wide Association Studies ◽

Susceptibility Loci ◽

Genome Wide

Download Full-text

Synonymous Codon Usage Analysis of Thirty Two Mycobacteriophage Genomes

Advances in Bioinformatics ◽

10.1155/2009/316936 ◽

2009 ◽

Vol 2009 ◽

pp. 1-11 ◽

Cited By ~ 15

Author(s):

Sameer Hassan ◽

Vasantha Mahalingam ◽

Vanaja Kumar

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Nucleotide Composition ◽

Synonymous Codon Usage ◽

Compositional Bias ◽

Trna Genes ◽

Translation Efficiency ◽

Multivariate Statistical ◽

Strong Negative Correlation ◽

Highly Expressed Genes

Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc) and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.

Download Full-text

Genome-wide Meta-analysis of Alcohol Use Disorder in East Asians

10.1101/2021.09.17.21263732 ◽

2021 ◽

Author(s):

Hang Zhou ◽

Rasmon Kalayasiri ◽

Yan Sun ◽

Yaira Z. Nuñez ◽

Hong-Wen Deng ◽

...

Keyword(s):

Alcohol Use ◽

Alcohol Use Disorder ◽

Association Studies ◽

Meta Analysis ◽

East Asian ◽

Han Chinese ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

East Asians ◽

Genome Wide

AbstractBACKGROUNDAlcohol use disorder (AUD) is a leading cause of death and disability worldwide. Genome-wide association studies (GWAS) have identified ∼30 AUD risk genes in European populations, but many fewer in East Asians.METHODSWe conducted GWAS and genome-wide meta-analysis of AUD in 13,551 subjects with East Asian ancestry, using published summary data and newly genotyped data from four cohorts: 1) electronic health record (EHR)-diagnosed AUD in the Million Veteran Program (MVP)sample; 2) DSM-IV diagnosed alcohol dependence (AD) in a Han Chinese-GSA (array) cohort;3) AD in a Han Chinese-Cyto (array) cohort; and 4) two AD datasets in a Thai cohort. The MVP and Thai samples included newly genotyped subjects from ongoing recruitment. In total, 2,254 cases and 11,297 controls were analyzed. An AUD polygenic risk score was analyzed in an independent sample with 4,464 East Asians (Kaiser Permanente data from dbGaP). Phenotypes from survey data and ICD-9-CM diagnoses were tested for association with the AUD PRS.RESULTSTwo risk loci were detected: the well-known functional variant rs1229984 in ADH1B and rs3782886 in BRAP (near the ALDH2 gene locus) are the lead variants. AUD PRS was significantly associated with days per week of alcohol consumption (beta = 0.43, se = 0.067, p = 2.47×10−10) and nominally associated with pack years of smoking (beta = 0.09, se = 0.05, p = 4.52×10−2) and ever vs. never smoking (beta = 0.06, se = 0.02, p = 1.14×10−2).CONCLUSIONSThis is the largest GWAS of AUD in East Asians to date. Building on previous findings, we were able to analyze pleiotropy, but did not identify any new risk regions, underscoring the importance of recruiting additional East Asian subjects for alcohol GWAS.

Download Full-text

Identification of type 2 diabetes loci in 433,540 East Asian individuals

10.1101/685172 ◽

2019 ◽

Cited By ~ 3

Author(s):

Cassandra N Spracklen ◽

Momoko Horikoshi ◽

Young Jin Kim ◽

Kuang Lin ◽

Fiona Bragg ◽

...

Keyword(s):

Type 2 Diabetes ◽

Association Studies ◽

Meta Analysis ◽

East Asian ◽

Genome Wide Association ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide ◽

European Populations

SUMMARYMeta-analyses of genome-wide association studies (GWAS) have identified >240 loci associated with type 2 diabetes (T2D), however most loci have been identified in analyses of European-ancestry individuals. To examine T2D risk in East Asian individuals, we meta-analyzed GWAS data in 77,418 cases and 356,122 controls. In the main analysis, we identified 298 distinct association signals at 178 loci, and across T2D association models with and without consideration of body mass index and sex, we identified 56 loci newly implicated in T2D predisposition. Common variants associated with T2D in both East Asian and European populations exhibited strongly correlated effect sizes. New associations include signals in/near GDAP1, PTF1A, SIX3, ALDH2, a microRNA cluster, and genes that affect muscle and adipose differentiation. At another locus, eQTLs at two overlapping T2D signals act through two genes, NKX6-3 and ANK1, in different tissues. Association studies in diverse populations identify additional loci and elucidate disease genes, biology, and pathways.Type 2 diabetes (T2D) is a common metabolic disease primarily caused by insufficient insulin production and/or secretion by the pancreatic β cells and insulin resistance in peripheral tissues1. Most genetic loci associated with T2D have been identified in populations of European (EUR) ancestry, including a recent meta-analysis of genome-wide association studies (GWAS) of nearly 900,000 individuals of European ancestry that identified >240 loci influencing the risk of T2D2. Differences in allele frequency between ancestries affect the power to detect associations within a population, particularly among variants rare or monomorphic in one population but more frequent in another3,4. Although smaller than studies in European populations, a recent T2D meta-analysis in almost 200,000 Japanese individuals identified 28 additional loci4. The relative contributions of different pathways to the pathophysiology of T2D may also differ between ancestry groups. For example, in East Asian (EAS) populations, T2D prevalence is greater than in European populations among people of similar body mass index (BMI) or waist circumference5. We performed the largest meta-analysis of East Asian individuals to identify new genetic associations and provide insight into T2D pathogenesis.

Download Full-text

Nucleotide composition and synonymous codon usage of open reading frames in Norovirus GII.4 variants

Journal of Biomolecular Structure and Dynamics ◽

10.1080/07391102.2019.1689171 ◽

2019 ◽

Vol 38 (16) ◽

pp. 4764-4773

Author(s):

Wei Dan ◽

Yan Jin ◽

Zizhong Tang ◽

Yongmin Li ◽

Huipeng Yao

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Nucleotide Composition ◽

Synonymous Codon Usage ◽

Open Reading Frames ◽

Norovirus Gii ◽

Reading Frames

Download Full-text

Genome-Wide Analysis of the Synonymous Codon Usage Patterns in Riemerella anatipestifer

International Journal of Molecular Sciences ◽

10.3390/ijms17081304 ◽

2016 ◽

Vol 17 (8) ◽

pp. 1304 ◽

Cited By ~ 10

Author(s):

Jibin Liu ◽

Dekang Zhu ◽

Guangpeng Ma ◽

Mafeng Liu ◽

Mingshu Wang ◽

...

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Genome Wide Analysis ◽

Riemerella Anatipestifer ◽

Genome Wide ◽

Usage Patterns

Download Full-text