Evaluating molecular modeling tools for thermal stability using an independently generated dataset

Mapping Intimacies ◽

10.1101/856732 ◽

2019 ◽

Cited By ~ 1

Author(s):

Peishan Huang ◽

Simon K. S. Chu ◽

Henrique N. Frizzo ◽

Morgan P. Connolly ◽

Ryan W. Caster ◽

...

Keyword(s):

Thermal Stability ◽

Soluble Protein ◽

Pearson Correlation ◽

Published Data ◽

Data Sets ◽

Modeling Tools ◽

Data Set ◽

Computational Tools ◽

Physical Features ◽

New Algorithms

ABSTRACTEngineering proteins to enhance thermal stability is a widely utilized approach for creating industrially relevant biocatalysts. Computational tools that guide these engineering efforts remain an active area of research with new data sets and develop algorithms. To aid in these efforts, we are reporting an expansion of our previously published data set of mutants for a β-glucosidase to include both measures of TM and ΔΔG, to complement the previously reported measures of T50 and kinetic constants (kcat and KM). For a set of 51 mutants, we found that T50 and TM are moderately correlated with a Pearson correlation coefficient (PCC) of 0.58, indicated the two methods capture different physical features. The performance of predicted stability using five computational tools are also evaluated on the 51 mutants dataset, none of which are found to be strong predictors of the observed changes in T50, TM, or ΔΔG. Furthermore, the ability of the five algorithms to predict the production of isolatable soluble protein is examined, which revealed that Rosetta ΔΔG, ELASPIC, and DeepDDG are capable of predicting if a mutant could be produced and isolated as a soluble protein. These results further highlight the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of soluble mutants.

Download Full-text

Relative testis size and mating systems in anurans: large testis in multiple-male mating in foam-nesting frogs

Animal Biology ◽

10.1163/157075511x570312 ◽

2011 ◽

Vol 61 (2) ◽

pp. 225-238 ◽

Cited By ~ 15

Author(s):

Wen Bo Liao ◽

Zhi Ping Mi ◽

Cai Quan Zhou ◽

Ling Jin ◽

Xian Han ◽

...

Keyword(s):

Sperm Competition ◽

Published Data ◽

Male Mating ◽

Data Sets ◽

Testis Size ◽

Data Set ◽

Monogamous Species ◽

Large Testis ◽

Testes Size ◽

Testis Mass

AbstractComparative studies of the relative testes size in animals show that promiscuous species have relatively larger testes than monogamous species. Sperm competition favours the evolution of larger ejaculates in many animals – they give bigger testes. In the view, we presented data on relative testis mass for 17 Chinese species including 3 polyandrous species. We analyzed relative testis mass within the Chinese data set and combining those data with published data sets on Japanese and African frogs. We found that polyandrous foam nesting species have relatively large testes, suggesting that sperm competition was an important factor affecting the evolution of relative testes size. For 4 polyandrous species testes mass is positively correlated with intensity (males/mating) but not with risk (frequency of polyandrous matings) of sperm competition.

Download Full-text

Children with 5′-end NF1 gene mutations are more likely to have glioma

Neurology Genetics ◽

10.1212/nxg.0000000000000192 ◽

2017 ◽

Vol 3 (5) ◽

pp. e192 ◽

Cited By ~ 12

Author(s):

Corina Anastasaki ◽

Stephanie M. Morris ◽

Feng Gao ◽

David H. Gutmann

Keyword(s):

Gene Mutation ◽

Statistical Significance ◽

Gene Mutations ◽

Neurofibromatosis Type ◽

Published Data ◽

Data Sets ◽

Nonsense Mutations ◽

Data Set ◽

Nf1 Gene ◽

The Relationship

Objective:To ascertain the relationship between the germline NF1 gene mutation and glioma development in patients with neurofibromatosis type 1 (NF1).Methods:The relationship between the type and location of the germline NF1 mutation and the presence of a glioma was analyzed in 37 participants with NF1 from one institution (Washington University School of Medicine [WUSM]) with a clinical diagnosis of NF1. Odds ratios (ORs) were calculated using both unadjusted and weighted analyses of this data set in combination with 4 previously published data sets.Results:While no statistical significance was observed between the location and type of the NF1 mutation and glioma in the WUSM cohort, power calculations revealed that a sample size of 307 participants would be required to determine the predictive value of the position or type of the NF1 gene mutation. Combining our data set with 4 previously published data sets (n = 310), children with glioma were found to be more likely to harbor 5′-end gene mutations (OR = 2; p = 0.006). Moreover, while not clinically predictive due to insufficient sensitivity and specificity, this association with glioma was stronger for participants with 5′-end truncating (OR = 2.32; p = 0.005) or 5′-end nonsense (OR = 3.93; p = 0.005) mutations relative to those without glioma.Conclusions:Individuals with NF1 and glioma are more likely to harbor nonsense mutations in the 5′ end of the NF1 gene, suggesting that the NF1 mutation may be one predictive factor for glioma in this at-risk population.

Download Full-text

Recommendations for bacterial ribosome profiling experiments based on bioinformatic evaluation of published data

Journal of Biological Chemistry ◽

10.1074/jbc.ra119.012161 ◽

2020 ◽

Vol 295 (27) ◽

pp. 8999-9011 ◽

Cited By ~ 2

Author(s):

Alina Glaub ◽

Christopher Huptas ◽

Klaus Neuhaus ◽

Zachary Ardern

Keyword(s):

Ribosome Profiling ◽

Published Data ◽

Data Sets ◽

Drug Induced ◽

Data Set ◽

Protein Coding ◽

Bacterial Ribosome ◽

Translation Start ◽

Selection Step ◽

Basic Characteristics

Ribosome profiling (RIBO-Seq) has improved our understanding of bacterial translation, including finding many unannotated genes. However, protocols for RIBO-Seq and corresponding data analysis are not yet standardized. Here, we analyzed 48 RIBO-Seq samples from nine studies of Escherichia coli K12 grown in lysogeny broth medium and particularly focused on the size-selection step. We show that for conventional expression analysis, a size range between 22 and 30 nucleotides is sufficient to obtain protein-coding fragments, which has the advantage of removing many unwanted rRNA and tRNA reads. More specific analyses may require longer reads and a corresponding improvement in rRNA/tRNA depletion. There is no consensus about the appropriate sequencing depth for RIBO-Seq experiments in prokaryotes, and studies vary significantly in total read number. Our analysis suggests that 20 million reads that are not mapping to rRNA/tRNA are required for global detection of translated annotated genes. We also highlight the influence of drug-induced ribosome stalling, which causes bias at translation start sites. The resulting accumulation of reads at the start site may be especially useful for detecting weakly expressed genes. As different methods suit different questions, it may not be possible to produce a “one-size-fits-all” ribosome profiling data set. Therefore, experiments should be carefully designed in light of the scientific questions of interest. We propose some basic characteristics that should be reported with any new RIBO-Seq data sets. Careful attention to the factors discussed should improve prokaryotic gene detection and the comparability of ribosome profiling data sets.

Download Full-text

A generic model for the context-aware representation and federation of educational datasets: Experience from the dataTEL challenge

Knowledge Management & E-Learning: An International Journal ◽

10.34105/j.kmel.2017.09.009 ◽

2017 ◽

pp. 143-159

Keyword(s):

Large Scale ◽

Generic Model ◽

Data Sets ◽

Context Aware ◽

Data Set ◽

Common Information ◽

Common Information Model ◽

E Learning ◽

Current Systems ◽

New Algorithms

Research on online interactions during a learning situation to better understand users' practices and to provide them with quality-oriented features, resources and services is attracting a large community. As a result, the interest for sharing educational data sets that translate the interactions of users with e-learning systems has become a hot topic today. However, the current systems aggregating social and usage data about their users suffer from a series of weaknesses. In particular, they lack a common information model that would allow for exchanges of interaction data at a large scale. To tackle this issue, we propose in this paper a generic model able to federate heterogeneous context metadata and to facilitate their share and reuse. This framework has been successfully applied to several data sets provided by the research community, and thus gives access to a big data set that could help researchers to increase efficiency of existing learning analytics technics, and promote research and development of new algorithms and services on top of these data.

Download Full-text

Nature of intrafractional and interfractional prostate motion during stereotactic radiation.

Journal of Clinical Oncology ◽

10.1200/jco.2016.34.2_suppl.152 ◽

2016 ◽

Vol 34 (2_suppl) ◽

pp. 152-152

Author(s):

Karthikeyan Perumal ◽

Mahadev Potharaju

Keyword(s):

Treatment Time ◽

Published Data ◽

Data Sets ◽

Data Set ◽

Stereotactic Radiation ◽

X Ray ◽

Total Treatment Time ◽

Prostate Motion ◽

Predictable Pattern ◽

Total Treatment

152 Background: To characterize the intra-fraction and inter-fraction prostate motion as tracked by the X-ray images of the implanted gold fiducials during stereotactic radiotherapy with CyberKnife. The published data have analysed the linear and angular prostate motion intrafraction and interfraction prostate motion among patients. We sought to quantify the same within each patient. Methods: Twenty Five patients with localized prostate cancer treated with CyberKnife radiosurgery between January 2013 and August 2015 were studied retrospectively. A data set constitutes the deviations derived from X-ray images obtained between two consecutive couch motions. Results: Included in the analysis were 3926 data sets. A total of 210 non-coplanar fields were used per fraction. The mean total treatment time for all fields per fraction was 36.13 minutes. The detected and corrected movements over all were in a range of ± 10.1 mm in linear direction (Right: mean 1.1±0.4 mm; Left: mean 1.0±0.6 mm; Superior: mean 0.7±0.3 mm; Inferior: mean 1.6±0.6 mm; Anterior: mean 1.6±0.7 mm; Posterior: mean 0.5±0.3 mm with maximum (max) movement range of Right max 9.9±6.4 mm, Left max 7.1±3.4 mm, Superior max 8.6±5.4 mm, Inferior max 10.1±8.5 mm, Anterior max 9.2±6.5 mm, Posterior max 8.4±2.9 mm) and angular movements were in a range of ± 6.7 deg in all directions (Right Angle: mean 0.6±0.3 deg; Left Angle: mean 0.6±0.3 deg; Head Up(H-U): mean 1.3±0.6 deg; Head Down(H-D): mean 1.4±0.6 deg; Counter-Clockwise movement (CCW): mean 0.7±0.3 deg; Clockwise movement (CW): mean 0.5±0.3 deg with max rotation range of Right angle max 2.4±2 deg, Left angle max 2.7±2 deg, H-U max 10.2±3.5 deg, H-D max 6.7±4.8 deg, CCW 4±2.9 deg, CW max 2.8±2.4 deg). There was an unpredictable change in prostate motion inter-fraction in each patient. But, a unique observation is that a predictable pattern exists for prostate motion intra-fraction within a patient. Change in the linear or angular prostate motion intra-fraction in any direction is not erratic. Conclusions: The linear and rotational prostate motion intra-fraction in any direction has a predictable pattern and any change is gradual and not erratic. The motion shows secular trend during the course of treatment.

Download Full-text

The use of genetic programming to develop a predictor of swash excursion on sandy beaches

Natural Hazards and Earth System Science ◽

10.5194/nhess-18-599-2018 ◽

2018 ◽

Vol 18 (2) ◽

pp. 599-611 ◽

Cited By ~ 14

Author(s):

Marinella Passarella ◽

Evan B. Goldstein ◽

Sandro De Muro ◽

Giovanni Coco

Keyword(s):

Genetic Programming ◽

Sandy Beaches ◽

Coastal Hazards ◽

Coastal Processes ◽

Published Data ◽

Data Sets ◽

Prediction Errors ◽

Data Set ◽

Physical Insight ◽

Insight Into

Abstract. We use genetic programming (GP), a type of machine learning (ML) approach, to predict the total and infragravity swash excursion using previously published data sets that have been used extensively in swash prediction studies. Three previously published works with a range of new conditions are added to this data set to extend the range of measured swash conditions. Using this newly compiled data set we demonstrate that a ML approach can reduce the prediction errors compared to well-established parameterizations and therefore it may improve coastal hazards assessment (e.g. coastal inundation). Predictors obtained using GP can also be physically sound and replicate the functionality and dependencies of previous published formulas. Overall, we show that ML techniques are capable of both improving predictability (compared to classical regression approaches) and providing physical insight into coastal processes.

Download Full-text

Bayesian Inference for Correlations in the Presence of Measurement Error and Estimation Uncertainty

Collabra Psychology ◽

10.1525/collabra.78 ◽

2017 ◽

Vol 3 (1) ◽

Cited By ~ 7

Author(s):

Dora Matzke ◽

Alexander Ly ◽

Ravi Selker ◽

Wouter D. Weeda ◽

Benjamin Scheibehenne ◽

...

Keyword(s):

Measurement Error ◽

Bayesian Modeling ◽

Pearson Correlation ◽

Psychological Research ◽

Uncertain Parameter ◽

Parameter Estimates ◽

Data Sets ◽

Data Set ◽

Bayesian Hierarchical ◽

Estimation Uncertainty

Whenever parameter estimates are uncertain or observations are contaminated by measurement error, the Pearson correlation coefficient can severely underestimate the true strength of an association. Various approaches exist for inferring the correlation in the presence of estimation uncertainty and measurement error, but none are routinely applied in psychological research. Here we focus on a Bayesian hierarchical model proposed by Behseta, Berdyyeva, Olson, and Kass (2009) that allows researchers to infer the underlying correlation between error-contaminated observations. We show that this approach may be also applied to obtain the underlying correlation between uncertain parameter estimates as well as the correlation between uncertain parameter estimates and noisy observations. We illustrate the Bayesian modeling of correlations with two empirical data sets; in each data set, we first infer the posterior distribution of the underlying correlation and then compute Bayes factors to quantify the evidence that the data provide for the presence of an association.

Download Full-text

Analysis of copy number variations at 15 schizophrenia-associated loci

The British Journal of Psychiatry ◽

10.1192/bjp.bp.113.131052 ◽

2014 ◽

Vol 204 (2) ◽

pp. 108-114 ◽

Cited By ~ 234

Author(s):

Elliott Rees ◽

James T. R. Walters ◽

Lyudmila Georgieva ◽

Anthony R. Isles ◽

Kimberly D. Chambert ◽

...

Keyword(s):

Copy Number ◽

Copy Number Variants ◽

Copy Number Variations ◽

High Rate ◽

Published Data ◽

Data Sets ◽

Deleterious Mutations ◽

Data Set ◽

Data Analyses ◽

Susceptibility Factors

BackgroundA number of copy number variants (CNVs) have been suggested as susceptibility factors for schizophrenia. For some of these the data remain equivocal, and the frequency in individuals with schizophrenia is uncertain.AimsTo determine the contribution of CNVs at 15 schizophrenia-associated loci (a) using a large new data-set of patients with schizophrenia (n= 6882) and controls (n= 6316), and (b) combining our results with those from previous studies.MethodWe used Illumina microarrays to analyse our data. Analyses were restricted to 520 766 probes common to all arrays used in the different data-sets.ResultsWe found higher rates in participants with schizophrenia than in controls for 13 of the 15 previously implicated CNVs. Six were nominally significantly associated (P<0.05) in this new data-set: deletions at 1q21.1,NRXN1, 15q11.2 and 22q11.2 and duplications at 16p11.2 and the Angelman/Prader–Willi Syndrome (AS/PWS) region. All eight AS/PWS duplications in patients were of maternal origin. When combined with published data, 11 of the 15 loci showed highly significant evidence for association with schizophrenia (P<4.1×10−4).ConclusionsWe strengthen the support for the majority of the previously implicated CNVs in schizophrenia. About 2.5% of patients with schizophrenia and 0.9% of controls carry a large, detectable CNV at one of these loci. Routine CNV screening may be clinically appropriate given the high rate of known deleterious mutations in the disorder and the comorbidity associated with these heritable mutations.

Download Full-text

Pre-clustering data sets using cluster4x improves the signal-to-noise ratio of high-throughput crystallography drug-screening analysis

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798320012619 ◽

2020 ◽

Vol 76 (11) ◽

pp. 1134-1144 ◽

Cited By ~ 2

Author(s):

Helen M. Ginn

Keyword(s):

Drug Targets ◽

Signal To Noise Ratio ◽

Published Data ◽

Data Sets ◽

Data Set ◽

X Ray Crystallography ◽

Fragment Screening ◽

High Profile ◽

Clustering Data ◽

Interactive Graphical User Interface

Drug and fragment screening at X-ray crystallography beamlines has been a huge success. However, it is inevitable that more high-profile biological drug targets will be identified for which high-quality, highly homogenous crystal systems cannot be found. With increasing heterogeneity in crystal systems, the application of current multi-data-set methods becomes ever less sensitive to bound ligands. In order to ease the bottleneck of finding a well behaved crystal system, pre-clustering of data sets can be carried out using cluster4x after data collection to separate data sets into smaller partitions in order to restore the sensitivity of multi-data-set methods. Here, the software cluster4x is introduced for this purpose and validated against published data sets using PanDDA, showing an improved total signal from existing ligands and identifying new hits in both highly heterogenous and less heterogenous multi-data sets. cluster4x provides the researcher with an interactive graphical user interface with which to explore multi-data set experiments.

Download Full-text

Phylogenetic position of Pelusios williamsi and a critique of current GenBank procedures (Reptilia: Testudines: Pelomedusidae)

Amphibia-Reptilia ◽

10.1163/156853812x627204 ◽

2012 ◽

Vol 33 (1) ◽

pp. 150-154 ◽

Cited By ~ 4

Author(s):

Uwe Fritz ◽

Mario Vargas-Ramírez ◽

Pavel Široký

Keyword(s):

Natural History ◽

Genetic Divergence ◽

Best Practice ◽

Nuclear Data ◽

Phylogenetic Position ◽

Published Data ◽

Data Sets ◽

Natural History Museums ◽

Data Set ◽

History Museums

We re-examine the phylogenetic position of Pelusios williamsi by merging new sequences with an earlier published data set of all Pelusios species, except the possibly extinct P. seychellensis, and the nine previously identified lineages of the closely allied genus Pelomedusa (2054 bp mtDNA, 2025 bp nDNA). Furthermore, we include new sequences of Pelusios broadleyi, P. castanoides, P. gabonensis and P. marani. Individual and combined analyses of the mitochondrial and nuclear data sets indicate that P. williamsi is sister to P. castanoides, as predicted by morphology. This provides evidence for the misidentification of GenBank sequences allegedly representing P. williamsi. Such mislabelled GenBank sequences contribute to continued confusion, because only the original submitter can revise their identification; an impractical procedure impeding the rectification of obvious mistakes. We recommend implementing another option for revising taxonomic identifications, paralleling the century-old best practice of natural history museums for new determinations of specimens. Within P. broadleyi, P. gabonensis and P. marani, there is only shallow genetic divergence, while some phylogeographic structuring is present in the wide-ranging species P. castaneus and P. castanoides.

Download Full-text