scholarly journals Identification of protein coding regions in the human genome by quadratic discriminant analysis

1997 ◽  
Vol 94 (2) ◽  
pp. 565-568 ◽  
Author(s):  
M. Q. Zhang
2020 ◽  
Author(s):  
Anyou Wang ◽  
Rong Hai

AbstractEukaryotic genomes gradually gain noncoding regions when advancing evolution and human genome actively transcribes >90% of its noncoding regions1, suggesting their criticality in evolutionary human genome. Yet <1% of them have been functionally characterized2, leaving most human genome in dark. Here we systematically decode endogenous lncRNAs located in unannotated regions of human genome and decipher a distinctive functional regime of lncRNAs hidden in massive RNAseq data. LncRNAs divergently distribute across chromosomes, independent of protein-coding regions. Their transcriptions barely initiate on promoters through polymerase II, but mostly on enhancers. Yet conventional enhancer activators(e.g. H3K4me1) only account for a small proportion of lncRNA activation, suggesting alternatively unknown mechanisms initiating the majority of lncRNAs. Meanwhile, lncRNA-self regulation also notably contributes to lncRNA activation. LncRNAs trans-regulate broad bioprocesses, including transcription and RNA processing, cell cycle, respiration, response to stress, chromatin organization, post-translational modification, and development. Overall lncRNAs govern their owned regime distinctive from protein’s.


2019 ◽  
Author(s):  
Jimin Pei ◽  
Lisa Kinch ◽  
Nick V. Grishin

AbstractThe human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs and has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs observed in the general population into a mutation severity measure of protein-coding genes. This measure reflects a gene’s tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.


PLoS ONE ◽  
2010 ◽  
Vol 5 (1) ◽  
pp. e8949 ◽  
Author(s):  
Danny A. Bitton ◽  
Duncan L. Smith ◽  
Yvonne Connolly ◽  
Paul J. Scutt ◽  
Crispin J. Miller

2021 ◽  
Author(s):  
Noah Dukler ◽  
Mehreen R Mughal ◽  
Ritika Ramani ◽  
Yi-Fei Huang ◽  
Adam Siepel

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.


2015 ◽  
Author(s):  
Danesh Saleheen ◽  
Pradeep Natarajan ◽  
Wei Zhao ◽  
Asif Rasheed ◽  
Sumeet Khetarpal ◽  
...  

A major goal of biomedicine is to understand the function of every gene in the human genome. Null mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such 'human knockouts' can provide insight into gene function. To date, comprehensive analysis of genes knocked out in humans has been limited by the fact that null mutations are infrequent in the general population and so, observing an individual homozygous null for a given gene is exceedingly rare. However, consanguineous unions are more likely to result in offspring who carry homozygous null mutations. In Pakistan, consanguinity rates are notably high. Here, we sequenced the protein-coding regions of 7,078 adult participants living in Pakistan and performed phenotypic analysis to identify homozygous null individuals and to understand consequences of complete gene disruption in humans. We enumerated 36,850 rare (<1 % minor allele frequency) null mutations. These homozygous null mutations led to complete inactivation of 961 genes in at least one participant. Homozygosity for null mutations at APOC3 was associated with absent plasma apolipoprotein C-III levels; at PLAG27, with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; and at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations. After physiologic challenge with oral fat, APOC3 knockouts displayed marked blunting of the usual post-prandial rise in plasma triglycerides compared to wild-type family members. These observations provide a roadmap to understand the consequences of complete disruption of a large fraction of genes in the human genome.


BMC Genomics ◽  
2013 ◽  
Vol 14 (1) ◽  
pp. 141 ◽  
Author(s):  
Jainab Khatun ◽  
Yanbao Yu ◽  
John A Wrobel ◽  
Brian A Risk ◽  
Harsha P Gunawardena ◽  
...  

2016 ◽  
Vol 44 (4) ◽  
pp. 1073-1078 ◽  
Author(s):  
Rogerio Alves de Almeida ◽  
Marcin G. Fraczek ◽  
Steven Parker ◽  
Daniela Delneri ◽  
Raymond T. O'Keefe

Many human diseases have been attributed to mutation in the protein coding regions of the human genome. The protein coding portion of the human genome, however, is very small compared with the non-coding portion of the genome. As such, there are a disproportionate number of diseases attributed to the coding compared with the non-coding portion of the genome. It is now clear that the non-coding portion of the genome produces many functional non-coding RNAs and these RNAs are slowly being linked to human diseases. Here we discuss examples where mutation in classical non-coding RNAs have been attributed to human disease and identify the future potential for the non-coding portion of the genome in disease biology.


2017 ◽  
Author(s):  
James M. Havrilla ◽  
Brent S. Pedersen ◽  
Ryan M. Layer ◽  
Aaron R. Quinlan

ABSTRACTDeep catalogs of genetic variation collected from many thousands of humans enable the detection of intraspecies constraint by revealing coding regions with a scarcity of variation. While existing techniques summarize constraint for entire genes, single metrics cannot capture the fine-scale variability in constraint within each protein-coding gene. To provide greater resolution, we have created a detailed map of constrained coding regions (CCRs) in the human genome by leveraging coding variation observed among 123,136 humans from the Genome Aggregation Database (gnomAD). The most constrained coding regions in our map are enriched for both pathogenic variants in ClinVar and de novo mutations underlying developmental disorders. CCRs also reveal protein domain families under high constraint, suggest unannotated or incomplete protein domains, and facilitate the prioritization of previously unseen variation in studies of disease. Finally, a subset of CCRs with the highest constraint likely exist within genes that cause yet unobserved human phenotypes owing to strong purifying selection.


Sign in / Sign up

Export Citation Format

Share Document