Epi-Gene: An R-Package for Easy Pan-Genome Analysis

BioMed Research International ◽

10.1155/2021/5585586 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Furqan Awan ◽

Muhammad Muddassir Ali ◽

Muhammad Hamid ◽

Muhammad Huzair Awan ◽

Muhammad Hassan Mushtaq ◽

...

Keyword(s):

Operating Systems ◽

R Package ◽

Computational Time ◽

Computing Environment ◽

Heat Maps ◽

Time Consumption ◽

Sequence File ◽

Gene Functions ◽

Programming Skills ◽

Core Genes

The main aim of this study was to develop a set of functions that can analyze the genomic data with less time consumption and memory. Epi-gene is presented as a solution to large sequence file handling and computational time problems. It uses less time and less programming skills in order to work with a large number of genomes. In the current study, some features of the Epi-gene R-package were described and illustrated by using a dataset of the 14 Aeromonas hydrophila genomes. The joining, relabeling, and conversion functions were also included in this package to handle the FASTA formatted sequences. To calculate the subsets of core genes, accessory genes, and unique genes, various Epi-gene functions have been used. Heat maps and phylogenetic genome trees were also constructed. This whole procedure was completed in less than 30 minutes. This package can only work on Windows operating systems. Different functions from other packages such as dplyr and ggtree were also used that were available in R computing environment.

Download Full-text

AB0210 ACREULAR: AN R PACKAGE FOR THE CALCULATION AND VISUALISATION OF ACR/EULAR RELATED RHEUMATOID ARTHRITIS MEASURES

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.2326 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1405.1-1406

Author(s):

F. Morton ◽

J. Nijjar ◽

C. Goodyear ◽

D. Porter

Keyword(s):

Rheumatoid Arthritis ◽

Functional Status ◽

Rheumatic Diseases ◽

Web Application ◽

R Package ◽

Diagnostic Classification ◽

Microsoft Excel ◽

Link Type ◽

Large Joint ◽

Programming Skills

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared

Download Full-text

lassosum2: an updated version complementing LDpred2

10.1101/2021.03.29.437510 ◽

2021 ◽

Author(s):

Florian Privé ◽

Bjarni J. Vilhjálmsson ◽

Timothy S. H. Mak

Keyword(s):

Sample Size ◽

Input Data ◽

R Package ◽

Computational Time ◽

Polygenic Score ◽

Score Method

AbstractWe present lassosum2, a new version of the polygenic score method lassosum, which we re-implement in R package bigsnpr. This new version uses the exact same input data as LDpred2 and is also very fast, which means that it can be run with almost no extra coding nor computational time when already running LDpred2. It can also be more robust than LDpred2, e.g. in the case of a large GWAS sample size misspecification. Therefore, lassosum2 is complementary to LDpred2.

Download Full-text

MLML2R: an R package for maximum likelihood estimation of DNA methylation and hydroxymethylation proportions

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0031 ◽

2019 ◽

Vol 18 (1) ◽

Cited By ~ 2

Author(s):

Samara F. Kiihl ◽

Maria Jose Martinez-Garrido ◽

Arce Domingo-Relloso ◽

Jose Bermudez ◽

Maria Tellez-Plaza

Keyword(s):

Dna Methylation ◽

Maximum Likelihood ◽

Likelihood Estimation ◽

Analytical Form ◽

R Package ◽

Maximum Likelihood Estimates ◽

Computational Time ◽

Iterative Approximation ◽

Sequencing Technologies ◽

Combining Data

Abstract Accurately measuring epigenetic marks such as 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC) at the single-nucleotide level, requires combining data from DNA processing methods including traditional (BS), oxidative (oxBS) or Tet-Assisted (TAB) bisulfite conversion. We introduce the R package MLML2R, which provides maximum likelihood estimates (MLE) of 5-mC and 5-hmC proportions. While all other available R packages provide 5-mC and 5-hmC MLEs only for the oxBS+BS combination, MLML2R also provides MLE for TAB combinations. For combinations of any two of the methods, we derived the pool-adjacent-violators algorithm (PAVA) exact constrained MLE in analytical form. For the three methods combination, we implemented both the iterative method by Qu et al. [Qu, J., M. Zhou, Q. Song, E. E. Hong and A. D. Smith (2013): “Mlml: consistent simultaneous estimates of dna methylation and hydroxymethylation,” Bioinformatics, 29, 2645–2646.], and also a novel non iterative approximation using Lagrange multipliers. The newly proposed non iterative solutions greatly decrease computational time, common bottlenecks when processing high-throughput data. The MLML2R package is flexible as it takes as input both, preprocessed intensities from Infinium Methylation arrays and counts from Next Generation Sequencing technologies. The MLML2R package is freely available at https://CRAN.R-project.org/package=MLML2R.

Download Full-text

Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms

Bioinformatics ◽

10.1093/bioinformatics/btz447 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5146-5154 ◽

Cited By ~ 19

Author(s):

Joanna Zyla ◽

Michal Marczyk ◽

Teresa Domaszewska ◽

Stefan H E Kaufmann ◽

Joanna Polanska ◽

...

Keyword(s):

False Positive Rate ◽

R Package ◽

Supplementary Information ◽

Computational Time ◽

P Value ◽

Gene Set ◽

Related Data ◽

Novel Approach ◽

Positive Rate ◽

Real World Datasets

Abstract Motivation Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. Results We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. Availability and implementation tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Shp2graph: Tools to Convert a Spatial Network into an Igraph Graph in R

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7080293 ◽

2018 ◽

Vol 7 (8) ◽

pp. 293 ◽

Cited By ~ 8

Author(s):

Binbin Lu ◽

Huabo Sun ◽

Paul Harris ◽

Miaozhong Xu ◽

Martin Charlton

Keyword(s):

Network Analysis ◽

Road Network ◽

R Package ◽

Statistical Computing ◽

Spatial Network ◽

Computing Environment ◽

Vast Array ◽

Urban Road ◽

Analytical Tools ◽

Statistical Functions

In this study, we introduce the R package shp2graph, which provides tools to convert a spatial network into an ‘igraph’ graph of the igraphR package. This conversion greatly empowers a spatial network study, as the vast array of graph analytical tools provided in igraph are then readily available to the network analysis, together with the inherent advantages of being within the R statistical computing environment and its vast array of statistical functions. Through three urban road network case studies, the calculation of road network distances with shp2graph and with igraph is demonstrated through four key stages: (i) confirming the connectivity of a spatial network; (ii) integrating points/locations with a network; (iii) converting a network into a graph; and (iv) calculating network distances (and travel times). Throughout, the required R commands are given to provide a useful tutorial on the use of shp2graph.

Download Full-text

XPolaris: an R-package to retrieve United States soil data at 30-meter resolution

BMC Research Notes ◽

10.1186/s13104-021-05729-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Luiz H. Moro Rosso ◽

Andre F. de Borja Reis ◽

Adrian A. Correndo ◽

Ignacio A. Ciampitti

Keyword(s):

United States ◽

Learning Algorithm ◽

Geographical Location ◽

R Package ◽

Soil Survey ◽

Soil Series ◽

Raster Data ◽

The Core ◽

Programming Skills ◽

R Programming

Abstract Objectives This data article aims to introduce the “XPolaris” R-package, designed to facilitate access to detailed soil data at any geographical location within the contiguous United States (CONUS). Without the need of advanced R-programming skills, XPolaris enables users to convert raster data from the POLARIS database into traditional spreadsheet format [i.e., Comma-Separated Values (CSV)] for further data analyses. Data description The core of this publication is a code-tutorial envisioned to assist users in retrieving soil raster data within the CONUS. All data is sourced from the POLARIS database, a 30-m probabilistic map of soil series and different soil properties [Chaney et al. Geoderma 274:54, 2016, Chaney et al. Water Resour Res 55:2916, 2019]. POLARIS represents an optimization of the Soil Survey Geographic (SSURGO) database, circumventing issues of spatial disaggregation, harmonizing, and filling spatial gaps. POLARIS was constructed using a machine learning algorithm, the Disaggregation and Harmonisation of Soil Map Units Through Resampled Classification Trees (DSMART-HPC) [Odgers et al. Geoderma 214:91, 2014]. Although the data is easily accessible in a raster format, retrieving large amounts of data can be time-consuming or require advanced programming skills.

Download Full-text

Software solutions for reproducible RNA-seq workflows

10.1101/099028 ◽

2017 ◽

Cited By ~ 1

Author(s):

Trevor Meiss ◽

Ling-Hong Hung ◽

Yuguang Xiong ◽

Eric Sobie ◽

Ka Yee Yeung

Keyword(s):

Operating Systems ◽

Source Code ◽

Software Tools ◽

Rna Seq ◽

Computing Environment ◽

Data Analyses ◽

Rnaseq Data ◽

And Performance

AbstractComputational workflows typically consist of many tools that are usually distributed as compiled binaries or source code. Each of these software tools typically depends on other installed software, and performance could potentially vary due to versions, updates, and operating systems. We show here that the analysis of mRNA-seq data can depend on the computing environment, and we demonstrate that software containers represent practical solutions that ensure the reproducibility of RNAseq data analyses.

Download Full-text

COMPARATIVE ANALYSIS OF SUPERPIXEL SEGMENTATION METHODS

International Journal of Engineering Technologies and Management Research ◽

10.29121/ijetmr.v5.i3.2018.172 ◽

2020 ◽

Vol 5 (3) ◽

pp. 1-9

Author(s):

Sumit Kaur ◽

R.K Bansal

Keyword(s):

Computer Vision ◽

Comparative Analysis ◽

Computational Complexity ◽

Point Of View ◽

Computational Time ◽

Time Consumption ◽

Superpixel Segmentation ◽

Segmentation Methods ◽

Computer Vision Applications ◽

High Computational Complexity

Superpixel segmentation showed to be a useful preprocessing step in many computer vision applications. Superpixel’s purpose is to reduce the redundancy in the image and increase efficiency from the point of view of the next processing task. This led to a variety of algorithms to compute superpixel segmentations, each with individual strengths and weaknesses. Many methods for the computation of superpixels were already presented. A drawback of most of these methods is their high computational complexity and hence high computational time consumption. K mean based SLIC method shows better performance as compare to other while evaluating on the bases of under segmentation error and boundary recall, etc parameters.

Download Full-text

Demonstration of an open source platform for reproducible comparison of predictive models

10.7287/peerj.preprints.2251 ◽

2016 ◽

Author(s):

Neeraj Bokde ◽

Kishore Kulat

Keyword(s):

Moving Average ◽

Time Series Prediction ◽

Prediction Method ◽

R Package ◽

Autoregressive Integrated Moving Average ◽

Imputation Methods ◽

Time Consumption ◽

Error Metrics ◽

Pattern Sequence ◽

State Of Art

This paper discusses about a tool PredictTestbench, which is an R package which provides a testbench to do comparison of prediction methods. This package compares a proposed time series prediction method with other default methods like Autoregressive integrated moving average (ARIMA) and Pattern Sequence based Forecasting (PSF). The testbench is not limited to these methods. It allows user to add or remove multiple numbers of methods in the existing methods in the study. By default, testbench compares different imputation methods considering different error metrics RMSE, MAE or MAPE. Along with this, it facilitates user to add new error metrics as per requirements. The simplicity of the package usage and significant reduction in efforts and time consumption in state of art procedure, adds valuable advantage to it. The aim of the testbench is reduce the efforts for coding, experiments on output visualization and time for different steps involved in such study. This paper explains the use of all functions in PredictTestbench package with the demonstration of examples.

Download Full-text

On the automatic annotation of gene functions using observational data and phylogenetic trees

10.1101/2020.05.14.095687 ◽

2020 ◽

Author(s):

George G. Vega Yon ◽

Duncan C. Thomas ◽

John Morrison ◽

Huaiyu Mi ◽

Paul D. Thomas ◽

...

Keyword(s):

Gene Function ◽

Phylogenetic Trees ◽

Evolutionary Model ◽

Computational Prediction ◽

Gene Families ◽

R Package ◽

Biomedical Sciences ◽

Computationally Efficient ◽

Link Type ◽

Gene Functions

AbstractMotivationGene function annotation is important for a variety of downstream analyses of genetic data. Yet experimental characterization of function remains costly and slow, making computational prediction an important endeavor. In this paper we use a probabilistic evolutionary model built upon phylogenetic trees and experimental Gene Ontology functional annotations that allows automated prediction of function for unannotated genes.ResultsWe have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out validation, and we further validated some of the predictions in the experimental scientific literature.AvailabilityOur method has been implemented as an R package and it is available online at https://github.com/USCBiostats/aphylo. Code needed to reproduce the tables and figures can be found in https://github.com/USCbiostats/aphylo-simulations.Author summaryUnderstanding the individual role that genes play in life is a key issue in biomedical-sciences. While information regarding gene functions is continuously growing, the number of genes with unknown biological purpose is yet greater. Because of this, scientists have dedicated much of their time to build and design tools that automatically infer gene functions. In this paper, we present yet another attempt to do such. While very simple, our model of gene-function evolution has some key features that have the potential to generate an impact in the field: (a) compared to other methods, ours is highly-scalable, which means that it is possible to simultaneously analyze hundreds of what are known as gene-families, compromising thousands of genes, (b) supports our biological intuition as our model’s data-driven results coherently agree with what theory dictates regarding how gene-functions evolved, (c) notwithstanding its simplicity, the model’s prediction accuracy is comparable to other more complex alternatives, and (d) perhaps most importantly, it can be used to both support new annotations and to suggest areas in which existing annotations show inconsistencies that may indicate errors or controversies.

Download Full-text