Variation benchmark datasets: update, criteria, quality and applications

Database ◽

10.1093/database/baz117 ◽

2020 ◽

Vol 2020 ◽

Cited By ~ 2

Author(s):

Anasua Sarkar ◽

Yang Yang ◽

Mauno Vihinen

Keyword(s):

Rna Splicing ◽

Binding Free Energy ◽

Regulatory Elements ◽

Structural Level ◽

Coding Region ◽

Method Performance ◽

Energy Disorder ◽

Protein Property ◽

Benchmark Datasets ◽

Dna Regulatory Elements

Abstract Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench

Download Full-text

Variation Benchmark Datasets: Update, Criteria, Quality and Applications

10.1101/634766 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anasua Sarkar ◽

Yang Yang ◽

Mauno Vihinen

Keyword(s):

Rna Splicing ◽

Binding Free Energy ◽

Regulatory Elements ◽

Coding Region ◽

Method Performance ◽

New Methods ◽

Energy Disorder ◽

Protein Property ◽

Benchmark Datasets ◽

Dna Regulatory Elements

ABSTRACTDevelopment of new computational methods and testing their performance has to be done on experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets. They have been used for training and benchmarking predictors for various types of variations and their effects. There are 419 new datasets from 109 papers containing altogether 329003373 variants; however there is plenty of redundancy between the datasets. VariBench is freely available athttp://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property predictions for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performance to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and showed that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data.AUTHOR SUMMARYA prediction method performance can only be assessed in comparison to existing knowledge. For that purpose benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. We collected variation datasets from literature, website and databases. There are 419 separate new datasets, which however contain plenty of redundancy. VariBench is freely available athttp://structure.bmc.lu.se/VariBench/. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property predictions for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. The updated VariBench facilitates development and testing of new methods and comparison of obtained performance to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies and showed that such comparisons are possible and useful when the details of studies and the datasets are shared.

Download Full-text

Structure and Regulation of the Salivary Gland Secretion Protein Gene Sgs-1 of Drosophila melanogaster

Genetics ◽

10.1093/genetics/153.2.753 ◽

1999 ◽

Vol 153 (2) ◽

pp. 753-762

Author(s):

Günther E Roth ◽

Sigrid Wattler ◽

Hartmut Bornschein ◽

Michael Lehmann ◽

Günter Korge

Keyword(s):

Drosophila Melanogaster ◽

Tandem Repeats ◽

Transcriptional Start Site ◽

Regulatory Elements ◽

Coding Region ◽

Band Shift ◽

Salivary Gland Secretion ◽

Enhancer Binding Protein ◽

Factor Secretion ◽

Third Instar Larvae

Abstract The Drosophila melanogaster gene Sgs-1 belongs to the secretion protein genes, which are coordinately expressed in salivary glands of third instar larvae. Earlier analysis had implied that Sgs-1 is located at the 25B2-3 puff. We cloned Sgs-1 from a YAC covering 25B2-3. Despite using a variety of vectors and Escherichia coli strains, subcloning from the YAC led to deletions within the Sgs-1 coding region. Analysis of clonable and unclonable sequences revealed that Sgs-1 mainly consists of 48-bp tandem repeats encoding a threonine-rich protein. The Sgs-1 inserts from single λ clones are heterogeneous in length, indicating that repeats are eliminated. By analyzing the expression of Sgs-1/lacZ fusions in transgenic flies, cis-regulatory elements of Sgs-1 were mapped to lie within 1 kb upstream of the transcriptional start site. Band shift assays revealed binding sites for the transcription factor fork head (FKH) and the factor secretion enhancer binding protein 3 (SEBP3) at positions that are functionally relevant. FKH and SEBP3 have been shown previously to be involved in the regulation of Sgs-3 and Sgs-4. Comparison of the levels of steady state RNA and of the transcription rates for Sgs-1 and Sgs-1/lacZ reporter genes indicates that Sgs-1 RNA is 100-fold more stable than Sgs-1/lacZ RNA. This has implications for the model of how Sgs transcripts accumulate in late third instar larvae.

Download Full-text

Assessing the regulatory potential of transposable elements using chromatin accessibility profiles of maize transposons

Genetics ◽

10.1093/genetics/iyaa003 ◽

2020 ◽

Vol 217 (1) ◽

Author(s):

Jaclyn M Noshay ◽

Alexandre P Marand ◽

Sarah N Anderson ◽

Peng Zhou ◽

Maria Katherine Mejia Guerra ◽

...

Keyword(s):

Transposable Elements ◽

Allelic Variation ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Transcriptional Responses ◽

Maize Genotypes ◽

Show Evidence ◽

Regulatory Potential ◽

Dna Regulatory Elements ◽

Accessible Chromatin

Abstract Transposable elements (TEs) have the potential to create regulatory variation both through the disruption of existing DNA regulatory elements and through the creation of novel DNA regulatory elements. In a species with a large genome, such as maize, many TEs interspersed with genes create opportunities for significant allelic variation due to TE presence/absence polymorphisms among individuals. We used information on putative regulatory elements in combination with knowledge about TE polymorphisms in maize to identify TE insertions that interrupt existing accessible chromatin regions (ACRs) in B73 as well as examples of polymorphic TEs that contain ACRs among four inbred lines of maize including B73, Mo17, W22, and PH207. The TE insertions in three other assembled maize genomes (Mo17, W22, or PH207) that interrupt ACRs that are present in the B73 genome can trigger changes to the chromatin, suggesting the potential for both genetic and epigenetic influences of these insertions. Nearly 20% of the ACRs located over 2 kb from the nearest gene are located within an annotated TE. These are regions of unmethylated DNA that show evidence for functional importance similar to ACRs that are not present within TEs. Using a large panel of maize genotypes, we tested if there is an association between the presence of TE insertions that interrupt, or carry, an ACR and the expression of nearby genes. While most TE polymorphisms are not associated with expression for nearby genes, the TEs that carry ACRs exhibit enrichment for being associated with higher expression of nearby genes, suggesting that these TEs may contribute novel regulatory elements. These analyses highlight the potential for a subset of TEs to rewire transcriptional responses in eukaryotic genomes.

Download Full-text

Missense and silent tau gene mutations cause frontotemporal dementia with parkinsonism-chromosome 17 type, by affecting multiple alternative RNA splicing regulatory elements

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.96.10.5598 ◽

1999 ◽

Vol 96 (10) ◽

pp. 5598-5603 ◽

Cited By ~ 334

Author(s):

I. D'Souza ◽

P. Poorkaj ◽

M. Hong ◽

D. Nochlin ◽

V. M.- Y. Lee ◽

...

Keyword(s):

Frontotemporal Dementia ◽

Rna Splicing ◽

Gene Mutations ◽

Regulatory Elements ◽

Chromosome 17 ◽

Splicing Regulatory Elements ◽

Multiple Alternative

Download Full-text

Considerations when investigating lncRNA function in vivo

eLife ◽

10.7554/elife.03058 ◽

2014 ◽

Vol 3 ◽

Cited By ~ 222

Author(s):

Andrew R Bassett ◽

Asifa Akhtar ◽

Denise P Barlow ◽

Adrian P Bird ◽

Neil Brockdorff ◽

...

Keyword(s):

Normal Cell ◽

Regulatory Elements ◽

Genomic Locus ◽

Cellular Processes ◽

Cell Processes ◽

Non Coding Rnas ◽

Per Se ◽

Dna Regulatory Elements

Although a small number of the vast array of animal long non-coding RNAs (lncRNAs) have known effects on cellular processes examined in vitro, the extent of their contributions to normal cell processes throughout development, differentiation and disease for the most part remains less clear. Phenotypes arising from deletion of an entire genomic locus cannot be unequivocally attributed either to the loss of the lncRNA per se or to the associated loss of other overlapping DNA regulatory elements. The distinction between cis- or trans-effects is also often problematic. We discuss the advantages and challenges associated with the current techniques for studying the in vivo function of lncRNAs in the light of different models of lncRNA molecular mechanism, and reflect on the design of experiments to mutate lncRNA loci. These considerations should assist in the further investigation of these transcriptional products of the genome.

Download Full-text

A Common Motif within the Negative Regulatory Regions of Multiple Factors Inhibits Their Transcriptional Synergy

Molecular and Cellular Biology ◽

10.1128/mcb.20.16.6040-6050.2000 ◽

2000 ◽

Vol 20 (16) ◽

pp. 6040-6050 ◽

Cited By ~ 154

Author(s):

Jorge A. Iñiguez-Lluhí ◽

David Pearce

Keyword(s):

Binding Sites ◽

Regulatory Elements ◽

General Mechanism ◽

Proper Function ◽

Protein Motif ◽

Regulatory Regions ◽

Response Elements ◽

Common Motif ◽

Single Response ◽

Dna Regulatory Elements

ABSTRACT DNA regulatory elements frequently harbor multiple recognition sites for several transcriptional activators. The response mounted from such compound response elements is often more pronounced than the simple sum of effects observed at single binding sites. The determinants of such transcriptional synergy and its control, however, are poorly understood. Through a genetic approach, we have uncovered a novel protein motif that limits the transcriptional synergy of multiple DNA-binding regulators. Disruption of these conserved synergy control motifs (SC motifs) selectively increases activity at compound, but not single, response elements. Although isolated SC motifs do not regulate transcription when tethered to DNA, their transfer to an activator lacking them is sufficient to impose limits on synergy. Mechanistic analysis of the two SC motifs found in the glucocorticoid receptor N-terminal region reveals that they function irrespective of the arrangement of the receptor binding sites or their distance from the transcription start site. Proper function, however, requires the receptor's ligand-binding domain and an engaged dimer interface. Notably, the motifs are not functional in yeast and do not alter the effect of p160 coactivators, suggesting that they require other nonconserved components to operate. Many activators across multiple classes harbor seemingly unrelated negative regulatory regions. The presence of SC motifs within them, however, suggests a common function and identifies SC motifs as critical elements of a general mechanism to modulate higher-order interactions among transcriptional regulators.

Download Full-text

A scalable platform for the development of cell-type-specific viral drivers

eLife ◽

10.7554/elife.48089 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 12

Author(s):

Sinisa Hrvatin ◽

Christopher P Tzeng ◽

M Aurel Nagy ◽

Hume Stroud ◽

Charalampia Koutsioumpa ◽

...

Keyword(s):

Gene Expression ◽

Heterologous Gene Expression ◽

High Specificity ◽

Cell Types ◽

Regulatory Elements ◽

Cell Type ◽

Cell Type Specificity ◽

Cell Type Specific ◽

The Many ◽

Dna Regulatory Elements

Enhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify and test the function of enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method that leverages ATAC- and single-cell RNA-sequencing protocols, to characterize cell-type-specific enhancers that should enable genetic access and perturbation of gene function across mammalian cell types. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.

Download Full-text

The gene for the heat-shock protein 70 of Euplotes focardii, an Antarctic psychrophilic ciliate

Antarctic Science ◽

10.1017/s0954102004001774 ◽

2004 ◽

Vol 16 (1) ◽

pp. 23-28 ◽

Cited By ~ 20

Author(s):

ANTONIETTA LA TERZA ◽

CRISTINA MICELI ◽

PIERANGELO LUPORINI

Keyword(s):

Thermal Stress ◽

Heat Shock ◽

Heat Shock Protein ◽

Heat Shock Protein 70 ◽

Regulatory Region ◽

Regulatory Elements ◽

Specific Response ◽

Hsp70 Gene ◽

Sequence Motifs ◽

Coding Region

In the Antarctic ciliate, Euplotes focardii, the heat-shock protein 70 (Hsp70) gene does not show any appreciable activation by a thermal stress. Yet, it is activated to appreciable transcriptional levels by oxidative and chemical stresses, thus implying that it evolved a mechanism of selective, stress-specific response. A basic step in investigating this mechanism is the determination of the complete nucleotide sequence of the E. focardii Hsp70 gene. This gene contains a coding region specific for an Hsp70 protein that carries unique amino acid substitutions of potential significance for cold adaptation, and a 5' regulatory region that includes sequence motifs denoting two distinct types of stress-inducible promoters, known as “Heat Shock Elements” (HSE) and “Stress Response Elements” (StRE). From the study of the interactions of these regulatory elements with their specific transactivator factors we expect to shed light on the adaptive modifications that prevent the Hsp70 gene of E. focardii from responding to thermal stress while being responsive to other stresses.

Download Full-text

A 2.0 Mb microdeletion in proximal chromosome 14q12, involving regulatory elements of FOXG1, with the coding region of FOXG1 being unaffected, results in severe developmental delay, microcephaly, and hypoplasia of the corpus callosum

European Journal of Medical Genetics ◽

10.1016/j.ejmg.2013.05.012 ◽

2013 ◽

Vol 56 (9) ◽

pp. 526-528 ◽

Cited By ~ 11

Author(s):

Masaki Takagi ◽

Goro Sasaki ◽

Toshikatsu Mitsui ◽

Misa Honda ◽

Yoko Tanaka ◽

...

Keyword(s):

Corpus Callosum ◽

Developmental Delay ◽

Regulatory Elements ◽

Coding Region

Download Full-text

Dysregulated Transcriptional Control in Prostate Cancer

International Journal of Molecular Sciences ◽

10.3390/ijms20122883 ◽

2019 ◽

Vol 20 (12) ◽

pp. 2883 ◽

Cited By ~ 8

Author(s):

Simon J. Baumgart ◽

Ekaterina Nevedomskaya ◽

Bernard Haendler

Keyword(s):

Prostate Cancer ◽

Drug Targets ◽

Transcriptional Control ◽

Regulatory Elements ◽

Protein Coding ◽

Coding Regions ◽

Super Enhancer ◽

Position Coding ◽

Transcription Dysregulation ◽

Dna Regulatory Elements

Recent advances in whole-genome and transcriptome sequencing of prostate cancer at different stages indicate that a large number of mutations found in tumors are present in non-protein coding regions of the genome and lead to dysregulated gene expression. Single nucleotide variations and small mutations affecting the recruitment of transcription factor complexes to DNA regulatory elements are observed in an increasing number of cases. Genomic rearrangements may position coding regions under the novel control of regulatory elements, as exemplified by the TMPRSS2-ERG fusion and the amplified enhancer identified upstream of the androgen receptor (AR) gene. Super-enhancers are increasingly found to play important roles in aberrant oncogenic transcription. Several players involved in these processes are currently being evaluated as drug targets and may represent new vulnerabilities that can be exploited for prostate cancer treatment. They include factors involved in enhancer and super-enhancer function such as bromodomain proteins and cyclin-dependent kinases. In addition, non-coding RNAs with an important gene regulatory role are being explored. The rapid progress made in understanding the influence of the non-coding part of the genome and of transcription dysregulation in prostate cancer could pave the way for the identification of novel treatment paradigms for the benefit of patients.

Download Full-text