scholarly journals Epi-Gene: An R-Package for Easy Pan-Genome Analysis

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Furqan Awan ◽  
Muhammad Muddassir Ali ◽  
Muhammad Hamid ◽  
Muhammad Huzair Awan ◽  
Muhammad Hassan Mushtaq ◽  
...  

The main aim of this study was to develop a set of functions that can analyze the genomic data with less time consumption and memory. Epi-gene is presented as a solution to large sequence file handling and computational time problems. It uses less time and less programming skills in order to work with a large number of genomes. In the current study, some features of the Epi-gene R-package were described and illustrated by using a dataset of the 14 Aeromonas hydrophila genomes. The joining, relabeling, and conversion functions were also included in this package to handle the FASTA formatted sequences. To calculate the subsets of core genes, accessory genes, and unique genes, various Epi-gene functions have been used. Heat maps and phylogenetic genome trees were also constructed. This whole procedure was completed in less than 30 minutes. This package can only work on Windows operating systems. Different functions from other packages such as dplyr and ggtree were also used that were available in R computing environment.

2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1405.1-1406
Author(s):  
F. Morton ◽  
J. Nijjar ◽  
C. Goodyear ◽  
D. Porter

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared


2021 ◽  
Author(s):  
Florian Privé ◽  
Bjarni J. Vilhjálmsson ◽  
Timothy S. H. Mak

AbstractWe present lassosum2, a new version of the polygenic score method lassosum, which we re-implement in R package bigsnpr. This new version uses the exact same input data as LDpred2 and is also very fast, which means that it can be run with almost no extra coding nor computational time when already running LDpred2. It can also be more robust than LDpred2, e.g. in the case of a large GWAS sample size misspecification. Therefore, lassosum2 is complementary to LDpred2.


Author(s):  
Samara F. Kiihl ◽  
Maria Jose Martinez-Garrido ◽  
Arce Domingo-Relloso ◽  
Jose Bermudez ◽  
Maria Tellez-Plaza

Abstract Accurately measuring epigenetic marks such as 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) at the single-nucleotide level, requires combining data from DNA processing methods including traditional (BS), oxidative (oxBS) or Tet-Assisted (TAB) bisulfite conversion. We introduce the R package MLML2R, which provides maximum likelihood estimates (MLE) of 5-mC and 5-hmC proportions. While all other available R packages provide 5-mC and 5-hmC MLEs only for the oxBS+BS combination, MLML2R also provides MLE for TAB combinations. For combinations of any two of the methods, we derived the pool-adjacent-violators algorithm (PAVA) exact constrained MLE in analytical form. For the three methods combination, we implemented both the iterative method by Qu et al. [Qu, J., M. Zhou, Q. Song, E. E. Hong and A. D. Smith (2013): “Mlml: consistent simultaneous estimates of dna methylation and hydroxymethylation,” Bioinformatics, 29, 2645–2646.], and also a novel non iterative approximation using Lagrange multipliers. The newly proposed non iterative solutions greatly decrease computational time, common bottlenecks when processing high-throughput data. The MLML2R package is flexible as it takes as input both, preprocessed intensities from Infinium Methylation arrays and counts from Next Generation Sequencing technologies. The MLML2R package is freely available at https://CRAN.R-project.org/package=MLML2R.


2019 ◽  
Vol 35 (24) ◽  
pp. 5146-5154 ◽  
Author(s):  
Joanna Zyla ◽  
Michal Marczyk ◽  
Teresa Domaszewska ◽  
Stefan H E Kaufmann ◽  
Joanna Polanska ◽  
...  

Abstract Motivation Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. Results We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. Availability and implementation tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 7 (8) ◽  
pp. 293 ◽  
Author(s):  
Binbin Lu ◽  
Huabo Sun ◽  
Paul Harris ◽  
Miaozhong Xu ◽  
Martin Charlton

In this study, we introduce the R package shp2graph, which provides tools to convert a spatial network into an ‘igraph’ graph of the igraphR package. This conversion greatly empowers a spatial network study, as the vast array of graph analytical tools provided in igraph are then readily available to the network analysis, together with the inherent advantages of being within the R statistical computing environment and its vast array of statistical functions. Through three urban road network case studies, the calculation of road network distances with shp2graph and with igraph is demonstrated through four key stages: (i) confirming the connectivity of a spatial network; (ii) integrating points/locations with a network; (iii) converting a network into a graph; and (iv) calculating network distances (and travel times). Throughout, the required R commands are given to provide a useful tutorial on the use of shp2graph.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Luiz H. Moro Rosso ◽  
Andre F. de Borja Reis ◽  
Adrian A. Correndo ◽  
Ignacio A. Ciampitti

Abstract Objectives This data article aims to introduce the “XPolaris” R-package, designed to facilitate access to detailed soil data at any geographical location within the contiguous United States (CONUS). Without the need of advanced R-programming skills, XPolaris enables users to convert raster data from the POLARIS database into traditional spreadsheet format [i.e., Comma-Separated Values (CSV)] for further data analyses. Data description The core of this publication is a code-tutorial envisioned to assist users in retrieving soil raster data within the CONUS. All data is sourced from the POLARIS database, a 30-m probabilistic map of soil series and different soil properties [Chaney et al. Geoderma 274:54, 2016, Chaney et al. Water Resour Res 55:2916, 2019]. POLARIS represents an optimization of the Soil Survey Geographic (SSURGO) database, circumventing issues of spatial disaggregation, harmonizing, and filling spatial gaps. POLARIS was constructed using a machine learning algorithm, the Disaggregation and Harmonisation of Soil Map Units Through Resampled Classification Trees (DSMART-HPC) [Odgers et al. Geoderma 214:91, 2014]. Although the data is easily accessible in a raster format, retrieving large amounts of data can be time-consuming or require advanced programming skills.


2017 ◽  
Author(s):  
Trevor Meiss ◽  
Ling-Hong Hung ◽  
Yuguang Xiong ◽  
Eric Sobie ◽  
Ka Yee Yeung

AbstractComputational workflows typically consist of many tools that are usually distributed as compiled binaries or source code. Each of these software tools typically depends on other installed software, and performance could potentially vary due to versions, updates, and operating systems. We show here that the analysis of mRNA-seq data can depend on the computing environment, and we demonstrate that software containers represent practical solutions that ensure the reproducibility of RNAseq data analyses.


Author(s):  
Sumit Kaur ◽  
R.K Bansal

Superpixel segmentation showed to be a useful preprocessing step in many computer vision applications. Superpixel’s purpose is to reduce the redundancy in the image and increase efficiency from the point of view of the next processing task. This led to a variety of algorithms to compute superpixel segmentations, each with individual strengths and weaknesses. Many methods for the computation of superpixels were already presented. A drawback of most of these methods is their high computational complexity and hence high computational time consumption. K mean based SLIC method shows better performance as compare to other while evaluating on the bases of under segmentation error and boundary recall, etc parameters.


2016 ◽  
Author(s):  
Neeraj Bokde ◽  
Kishore Kulat

This paper discusses about a tool PredictTestbench, which is an R package which provides a testbench to do comparison of prediction methods. This package compares a proposed time series prediction method with other default methods like Autoregressive integrated moving average (ARIMA) and Pattern Sequence based Forecasting (PSF). The testbench is not limited to these methods. It allows user to add or remove multiple numbers of methods in the existing methods in the study. By default, testbench compares different imputation methods considering different error metrics RMSE, MAE or MAPE. Along with this, it facilitates user to add new error metrics as per requirements. The simplicity of the package usage and significant reduction in efforts and time consumption in state of art procedure, adds valuable advantage to it. The aim of the testbench is reduce the efforts for coding, experiments on output visualization and time for different steps involved in such study. This paper explains the use of all functions in PredictTestbench package with the demonstration of examples.


2020 ◽  
Author(s):  
George G. Vega Yon ◽  
Duncan C. Thomas ◽  
John Morrison ◽  
Huaiyu Mi ◽  
Paul D. Thomas ◽  
...  

AbstractMotivationGene function annotation is important for a variety of downstream analyses of genetic data. Yet experimental characterization of function remains costly and slow, making computational prediction an important endeavor. In this paper we use a probabilistic evolutionary model built upon phylogenetic trees and experimental Gene Ontology functional annotations that allows automated prediction of function for unannotated genes.ResultsWe have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out validation, and we further validated some of the predictions in the experimental scientific literature.AvailabilityOur method has been implemented as an R package and it is available online at https://github.com/USCBiostats/aphylo. Code needed to reproduce the tables and figures can be found in https://github.com/USCbiostats/aphylo-simulations.Author summaryUnderstanding the individual role that genes play in life is a key issue in biomedical-sciences. While information regarding gene functions is continuously growing, the number of genes with unknown biological purpose is yet greater. Because of this, scientists have dedicated much of their time to build and design tools that automatically infer gene functions. In this paper, we present yet another attempt to do such. While very simple, our model of gene-function evolution has some key features that have the potential to generate an impact in the field: (a) compared to other methods, ours is highly-scalable, which means that it is possible to simultaneously analyze hundreds of what are known as gene-families, compromising thousands of genes, (b) supports our biological intuition as our model’s data-driven results coherently agree with what theory dictates regarding how gene-functions evolved, (c) notwithstanding its simplicity, the model’s prediction accuracy is comparable to other more complex alternatives, and (d) perhaps most importantly, it can be used to both support new annotations and to suggest areas in which existing annotations show inconsistencies that may indicate errors or controversies.


Sign in / Sign up

Export Citation Format

Share Document