scholarly journals HaploBlocker: Creation of subgroup specific haplotype blocks and libraries

2018 ◽  
Author(s):  
Torsten Pook ◽  
Martin Schlather ◽  
Gustavo de los Campos ◽  
Manfred Mayer ◽  
Chris Carolin Schoen ◽  
...  

ABSTRACTThe concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction. We propose a novel approach (“HaploBlocker”) for defining and inferring haplotype blocks that focuses on linkage instead of the commonly used population-wide measures of linkage disequilibrium. We define a haplotype block as a sequence of genetic markers that has a predefined minimum frequency in the population and only haplotypes with a similar sequence of markers are considered to carry that block, effectively screening a dataset for group-wise identity-by-descent. From these haplotype blocks we construct a haplotype library that represents a large proportion of genetic variability with a limited number of blocks. Our method is implemented in the associated R-package HaploBlocker and provides flexibility to not only optimize the structure of the obtained haplotype library for subsequent analyses, but is also able to handle datasets of different marker density and genetic diversity. By using haplotype blocks instead of SNPs, local epistatic interactions can be naturally modelled and the reduced number of parameters enables a wide variety of new methods for further genomic analyses such as genomic prediction and the detection of selection signatures. We illustrate our methodology with a dataset comprising 501 doubled haploid lines in a European maize landrace genotyped at 501’124 SNPs. With the suggested approach, we identified 2’991 haplotype blocks with an average length of 2’685 SNPs that together represent 94% of the dataset.

2014 ◽  
Vol 17 (4) ◽  
Author(s):  
Raymond K. Walters ◽  
Charles Laurin ◽  
Gitta H. Lubke

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.


Author(s):  
RASHI VOHRA ◽  
BRAJESH PATEL

The utmost negative impact of advancement of technology is an exponential increase in security threats, due to which tremendous demand for effective electronic security is increasing importantly. The principles of any security mechanism are confidentiality, authentication, integrity, non-repudiation, access control and availability. Cryptography is an essential aspect for secure communications. Many chaotic cryptosystem has been developed, as a result of the interesting relationship between the two field chaos and cryptography phenomenological behavior. In this paper, an overview of cryptography, optimization algorithm and chaos theory is provided and a novel approach for encryption and decryption based on chaos and optimization algorithms is discussed. In this article, the basic idea is to encrypt and decrypt the information using the concept of genetic algorithm with the pseudorandom sequence further used as a key in genetic algorithm operation for encryption: which is generated by application of chaotic map. This attempt result in good desirable cryptographic properties as a change in key will produce undesired result in receiver side. The suggested approach complements standard, algorithmic procedures, providing security solutions with novel features.


2020 ◽  
pp. 001857872091834
Author(s):  
Diana Altshuler ◽  
Kenny Yu ◽  
John Papadopoulos ◽  
Arash Dabestani

Purpose: The intent of this article is to evaluate a novel approach, using rapid cycle analytics and real world evidence, to optimize and improve the medication evaluation process to help the formulary decision making process, while reducing time for clinicians. Summary: The Pharmacy and Therapeutics (P&T) Committee within each health system is responsible for evaluating medication requests for formulary addition. Members of the pharmacy staff prepare the drug monograph or a medication use evaluation (MUE) and allocate precious clinical resources to review patient charts to assess efficacy and value. We explored a novel approach to evaluate the value of our intravenous acetaminophen (IV APAP) formulary admittance. This new methodology, called rapid cycle analytics, can assist hospitals in meeting and/or exceeding the minimum criteria of formulary maintenance as defined by the Joint Commission Standards. In this particular study, we assessed the effectiveness of IV APAP in total hip arthroplasty (THA) and total knee arthroplasty (TKA) procedures. We assessed the correlation to same-stay opioid utilization, average length of inpatient stay and post anesthesia care unit (PACU) time. Conclusion: We were able to explore and improve our organization’s approach in evaluating medications by partnering with an external analytics expert to help organize and normalize our data in a more robust, yet time efficient manner. Additionally, we were able to use a significantly larger external data set as a point of reference. Being able to perform this detailed analytical exercise for thousands of encounters internally and using a data warehouse of over 130 million patients as a point of reference in a short time has improved the depth of our assessment, as well as reducing valuable clinical resources allocated to MUEs to allow for more direct patient care. This clinically real-world and data-rich analytics model is the necessary foundation for using Artificial or Augmented Intelligence (AI) to make real-time formulary and drug selection decisions


SAGE Open ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 215824402093193
Author(s):  
Wing Shing Lee

To achieve a more effective teaching method, an experimental study using a learning-from-mistakes approach was investigated. A novel approach was adopted from organizational learning literature involving two steps. A first step established the psychological safety notion in students, and the second step called for a student discussion of mistakes they had made. Two classes of freshmen university students studying basic accounting participated in this study. One class was assigned as the treatment group, whereas the other the control group. Students’ performance was repeatedly measured on three separate occasions: pretreatment test, midterm examination, and a final examination as the posttest. Results showed that students from the treatment group outperformed those in the control group on the latter two occasions, whereas both groups scored similarly in the pretreatment test. It is thus concluded that such a suggested approach may ultimately help students learn more effectively.


2019 ◽  
Vol 35 (24) ◽  
pp. 5146-5154 ◽  
Author(s):  
Joanna Zyla ◽  
Michal Marczyk ◽  
Teresa Domaszewska ◽  
Stefan H E Kaufmann ◽  
Joanna Polanska ◽  
...  

Abstract Motivation Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. Results We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. Availability and implementation tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Damian Brzyski ◽  
Christine B. Peterson ◽  
Piotr Sobczyk ◽  
Emmanuel J. Candés ◽  
Malgorzata Bogdan ◽  
...  

AbstractWith the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on pre-screening to identify the level of resolution of distinct hypotheses. We show how FDR controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single marker and multivariate regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the NFBC66 cohort study.


2021 ◽  
pp. 1-9
Author(s):  
Moritz Marbach

Abstract Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.


F1000Research ◽  
2022 ◽  
Vol 9 ◽  
pp. 1159
Author(s):  
Qian (Vicky) Wu ◽  
Wei Sun ◽  
Li Hsu

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog


2020 ◽  
Author(s):  
Brandon Monier ◽  
Terry M. Casstevens ◽  
Edward S. Buckler

AbstractThe need for efficient tools and applications for analyzing genomic diversity is essential for any genetics research program. One such tool, TASSEL (Trait Analysis by aSSociation, Evolution and Linkage), provides many core methods for genomic analyses. Despite its efficiency, TASSEL has limited means for reproducible research and interacting with other analytical tools. Here we present an R package rTASSEL, a front-end to connect to a variety of highly used TASSEL methods and analytical tools. The goal of this package is to create a unified scripting workflow that exploits the analytical prowess of TASSEL in conjunction with R’s popular data handling and parsing capabilities without ever having the user to switch between these two environments.


Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1711
Author(s):  
Antonio Profico ◽  
Carlotta Zeppilli ◽  
Ileana Micarelli ◽  
Alessandro Mondanaro ◽  
Pasquale Raia ◽  
...  

In biological anthropology, parameters relating to cross-sectional geometry are calculated in paired long bones to evaluate the degree of lateralization of anatomy and, by inference, function. Here, we describe a novel approach, newly added to the morphomap R package, to assess the lateralization of the distribution of cortical bone along the entire diaphysis. The sample comprises paired long bones belonging to 51 individuals (10 females and 41 males) from The New Mexico Decedent Image Database with known biological profile, occupational and loading histories. Both males and females show a pattern of right lateralization. In addition, males are more lateralized than females, whereas there is not a significant association between lateralization with occupation and loading history. Body weight, height and long-bone length are the major factors driving the emergence of asymmetry in the humerus, while interestingly, the degree of lateralization decreases in the oldest individuals.


Sign in / Sign up

Export Citation Format

Share Document