adductomicsR: A package for detection and quantification of protein adducts from mass spectra of tryptic digests

Mapping Intimacies ◽

10.1101/463331 ◽

2018 ◽

Author(s):

Josie Hayes ◽

William M. B. Edmands ◽

Yukiko Yano ◽

Hasmik Grigoryan ◽

Courtney Schiffman ◽

...

Keyword(s):

Mass Spectra ◽

High Resolution Mass Spectrometry ◽

Internal Standard ◽

R Package ◽

Protein Adducts ◽

Supplementary Information ◽

Link Type ◽

Protein Digests ◽

Modified Peptides ◽

Time Drift

ABSTRACTSummaryLiquid chromatography-high resolution mass spectrometry (LC-HRMS) has been used to establish a method, referred to as ‘adductomics’, for characterisation of putative protein adducts at selected loci in human serum albumin (HSA). Applications of this method have been limited by the lack of software for untargeted analysis of modified peptides in protein digests. Here we present adductomicsR, an open-source R package for processing LC-HRMS data from analysis of adducted HSA peptides. The software interrogates mass spectra to correct for retention-time drift, and to discover and quantify putative adducts along with those for a housekeeping peptide and internal standard.Availability and implementationadductomicsR is written in R and publicly available at https://github.com/JosieLHayes/adductomicsR, which includes a vignette with example data.Supplementary informationmzXML files for the vignette and test dataset are available in an associated data package adductData (https://github.com/JosieLHayes/adductData)[email protected] SectionAPPLICATIONS NOTE

Download Full-text

hypeR: An R Package for Geneset Enrichment Workflows

10.1101/656637 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

Link Type ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.

Download Full-text

CluMSID: an R package for similarity-based clustering of tandem mass spectra to aid feature annotation in metabolomics

Bioinformatics ◽

10.1093/bioinformatics/btz005 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3196-3198 ◽

Cited By ~ 5

Author(s):

Tobias Depke ◽

Raimo Franke ◽

Mark Brönstrup

Keyword(s):

Mass Spectra ◽

Neutral Loss ◽

Metabolite Identification ◽

R Package ◽

Supplementary Information ◽

Tandem Mass ◽

Compound Identification ◽

Feature Identification ◽

Tandem Mass Spectra ◽

Interactive Visualizations

Abstract Summary Compound identification is one of the most eminent challenges in the untargeted analysis of complex mixtures of small molecules by mass spectrometry. Similarity of tandem mass spectra can provide valuable information on putative structural similarities between known and unknown analytes and hence aids feature identification in the bioanalytical sciences. We have developed CluMSID (Clustering of MS2 spectra for metabolite identification), an R package that enables researchers to make use of tandem mass spectra and neutral loss pattern similarities as a part of their metabolite annotation workflow. CluMSID offers functions for all analysis steps from import of raw data to data mining by unsupervised multivariate methods along with respective (interactive) visualizations. A detailed tutorial with example data is provided as supplementary information. Availability and implementation CluMSID is available as R package from https://github.com/tdepke/CluMSID/and from https://bioconductor.org/packages/CluMSID/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

blupADC: An R package and shiny toolkit for comprehensive genetic data analysis in animal and plant breeding

10.1101/2021.09.09.459557 ◽

2021 ◽

Author(s):

Quanshun Mei ◽

Chuanke Fu ◽

Jieling Li ◽

Shuhong Zhao ◽

Tao Xiang

Keyword(s):

Genetic Analysis ◽

Plant Breeding ◽

Genomic Data ◽

R Package ◽

Genotype Imputation ◽

Supplementary Information ◽

Composition Analysis ◽

Relationship Matrix ◽

Link Type ◽

Plant Breeding Program

AbstractSummaryGenetic analysis is a systematic and complex procedure in animal and plant breeding. With fast development of high-throughput genotyping techniques and algorithms, animal and plant breeding has entered into a genomic era. However, there is a lack of software, which can be used to process comprehensive genetic analyses, in the routine animal and plant breeding program. To make the whole genetic analysis in animal and plant breeding straightforward, we developed a powerful, robust and fast R package that includes genomic data format conversion, genomic data quality control and genotype imputation, breed composition analysis, pedigree tracing, analysis and visualization, pedigree-based and genomic-based relationship matrix construction, and genomic evaluation. In addition, to simplify the application of this package, we also developed a shiny toolkit for users.Availability and implementationblupADC is developed primarily in R with core functions written in C++. The development version is maintained at https://github.com/TXiang-lab/blupADC.Supplementary informationSupplementary data are available online

Download Full-text

gwasurvivr: an R package for genome wide survival analysis

10.1101/326033 ◽

2018 ◽

Author(s):

Abbas A Rizvi ◽

Ezgi Karaesmen ◽

Martin Morgan ◽

Leah Preus ◽

Junke Wang ◽

...

Keyword(s):

Survival Analysis ◽

Cox Model ◽

R Package ◽

Supplementary Information ◽

Parameter Estimates ◽

Survival Analyses ◽

Link Type ◽

Genome Wide ◽

Size Number ◽

Simple Interface

ABSTRACTSummaryTo address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr, an R/Bioconductor package with a simple interface for conducting genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers), IMPUTE2 or PLINK files. To decrease the number of iterations needed for convergence when optimizing the parameter estimates in the Cox model we modified the R package survival; covariates in the model are first fit without the SNP, and those parameter estimates are used as initial points. We benchmarked gwasurvivr with other software capable of conducting genome wide survival analysis (genipe, SurvivalGWAS_SV, and GWASTools). gwasurvivr is significantly faster and shows better scalability as sample size, number of SNPs and number of covariates increases.Availability and implementationgwasurvivr, including source code, documentation, and vignette are available at: http://bioconductor.org/packages/gwasurvivrContactAbbas Rizvi, [email protected]; Lara E Sucheston-Campbell, [email protected] information: Supplementary data are available at https://github.com/suchestoncampbelllab/gwasurvivr_manuscript

Download Full-text

ClusterMine: a Knowledge-integrated Clustering Approach based on Expression Profiles of Gene Sets

10.1101/255711 ◽

2018 ◽

Author(s):

Hong-Dong Li ◽

Yunpei Xu ◽

Xiaoshu Zhu ◽

Quan Liu ◽

Gilbert S. Omenn ◽

...

Keyword(s):

Expression Profiles ◽

R Package ◽

Biological Data ◽

Supplementary Information ◽

Consensus Clustering ◽

Cluster Membership ◽

Link Type ◽

Novel Approach ◽

Gene Sets ◽

Biological Interpretation

ABSTRACTMotivationClustering analysis is essential for understanding complex biological data. In widely used methods such as hierarchical clustering (HC) and consensus clustering (CC), expression profiles of all genes are often used to assess similarity between samples for clustering. These methods output sample clusters, but are not able to provide information about which gene sets (functions) contribute most to the clustering. So interpretability of their results is limited. We hypothesized that integrating prior knowledge of annotated biological processes would not only achieve satisfying clustering performance but also, more importantly, enable potential biological interpretation of clusters.ResultsHere we report ClusterMine, a novel approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets, e.g., in Gene Ontology. In addition to outputting cluster membership of each sample as conventional approaches do, it outputs gene sets that are most likely to contribute to the clustering, a feature facilitating biological interpretation. Using three cancer datasets, two single cell RNA-sequencing based cell differentiation datasets, one cell cycle dataset and two datasets of cells of different tissue origins, we found that ClusterMine achieved similar or better clustering performance and that top-scored gene sets prioritized by ClusterMine are biologically relevant.Implementation and availabilityClusterMine is implemented as an R package and is freely available at: www.genemine.org/[email protected] InformationSupplementary data are available at Bioinformatics online.

Download Full-text

MutSpot: detection of non-coding mutation hotspots in cancer genomes

10.1101/740944 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Amanda Guo ◽

Mei Mei Chang ◽

Anders Jacobsen Skanderup

Keyword(s):

Somatic Mutations ◽

R Package ◽

Supplementary Information ◽

Patient Specific ◽

Supplementary Data ◽

Link Type ◽

Genome Wide ◽

Cancer Genomes ◽

User Friendly ◽

Regulatory Dna

AbstractSummaryRecurrence and clustering of somatic mutations (hotspots) in cancer genomes may indicate positive selection and involvement in tumorigenesis. MutSpot performs genome-wide inference of mutation hotspots in non-coding and regulatory DNA of cancer genomes. MutSpot performs feature selection across hundreds of epigenetic and sequence features followed by estimation of position and patient-specific background somatic mutation probabilities. MutSpot is user-friendly, works on a standard workstation, and scales to thousands of cancer genomes.Availability and implementationMutSpot is implemented as an R package and is available at https://github.com/skandlab/MutSpot/Supplementary informationSupplementary data are available at https://github.com/skandlab/MutSpot/

Download Full-text

BiomeHorizon: visualizing microbiome time series data in R

10.1101/2021.08.29.458140 ◽

2021 ◽

Author(s):

Isaac Fink ◽

Richard J. Abdill ◽

Ran Blekhman ◽

Laura Grieneisen

Keyword(s):

Time Series ◽

Open Source ◽

Time Series Data ◽

R Package ◽

Supplementary Information ◽

Series Data ◽

Link Type ◽

Microbiome Research ◽

Microbiome Data ◽

Over Time

AbstractSummaryA key aspect of microbiome research is analysis of longitudinal dynamics using time series data. A method to visualize both the proportional and absolute change in the abundance of multiple taxa across multiple subjects over time is needed. We developed BiomeHorizon, an open-source R package that visualizes longitudinal compositional microbiome data using horizon plots.Availability and ImplementationBiomeHorizon is available at https://github.com/blekhmanlab/biomehorizon/ and released under the MIT license. A guide with step-by-step instructions for using the package is provided at https://blekhmanlab.github.io/biomehorizon/. The guide also provides code to reproduce all plots in this [email protected], [email protected], [email protected] informationNone

Download Full-text

Plasmid Profiler: Comparative Analysis of Plasmid Content in WGS Data

10.1101/121350 ◽

2017 ◽

Cited By ~ 2

Author(s):

Adrian Zetner ◽

Jennifer Cabral ◽

Laura Mataseje ◽

Natalie C Knox ◽

Philip Mabon ◽

...

Keyword(s):

Comparative Analysis ◽

De Novo ◽

Sequence Data ◽

Health Agency ◽

R Package ◽

Whole Genome Sequence ◽

Reference Sequence ◽

Supplementary Information ◽

Plasmid Content ◽

Link Type

AbstractSummaryComparative analysis of bacterial plasmids from whole genome sequence (WGS) data generated from short read sequencing is challenging. This is due to the difficulty in identifying contigs harbouring plasmid sequence data, and further difficulty in assembling such contigs into a full plasmid. As such, few software programs and bioinformatics pipelines exist to perform comprehensive comparative analyses of plasmids within and amongst sequenced isolates. To address this gap, we have developed Plasmid Profiler, a pipeline to perform comparative plasmid content analysis without the need forde novoassembly. The pipeline is designed to rapidly identify plasmid sequences by mapping reads to a plasmid reference sequence database. Predicted plasmid sequences are then annotated with their incompatibility group, if known. The pipeline allows users to query plasmids for genes or regions of interest and visualize results as an interactive heat map.Availability and ImplementationPlasmid Profiler is freely available software released under the Apache 2.0 open source software license. A stand-alone version of the entire Plasmid Profiler pipeline is available as a Docker container athttps://hub.docker.com/r/phacnml/plasmidprofiler_0_1_6/.The conda recipe for the Plasmid R package is available at:https://anaconda.org/bioconda/r-plasmidprofilerThe custom Plasmid Profiler R package is also available as a CRAN package athttps://cran.r-project.org/web/packages/Plasmidprofiler/index.htmlGalaxy tools associated with the pipeline are available as a Galaxy tool suite athttps://toolshed.g2.bx.psu.edu/repository?repository_id=55e082200d16a504The source code is available at:https://github.com/phac-nml/plasmidprofilerThe Galaxy implementation is available at:https://github.com/phac-nml/plasmidprofiler-galaxyContactEmail:[email protected]: National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, CanadaSupplementary informationDocumentation:http://plasmid-profiler.readthedocs.io/en/latest/

Download Full-text

Evaluating single-cell cluster stability using the Jaccard similarity index

10.1101/2020.05.26.116640 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ming Tang ◽

Yasin Kaymaz ◽

Brandon Logeman ◽

Stephen Eichhorn ◽

ZhengZheng S. Liang ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithms ◽

Similarity Index ◽

R Package ◽

Supplementary Information ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Jaccard Similarity ◽

Cluster Stability ◽

Link Type

AbstractMotivationOne major goal of single-cell RNA sequencing (scRNAseq) experiments is to identify novel cell types. With increasingly large scRNAseq datasets, unsupervised clustering methods can now produce detailed catalogues of transcriptionally distinct groups of cells in a sample. However, the interpretation of these clusters is challenging for both technical and biological reasons. Popular clustering algorithms are sensitive to parameter choices, and can produce different clustering solutions with even small changes in the number of principal components used, the k nearest neighbor, and the resolution parameters, among others.ResultsHere, we present a set of tools to evaluate cluster stability by subsampling, which can guide parameter choice and aid in biological interpretation. The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat, and estimation of cluster stability using the Jaccard similarity index. The Snakemake workflow takes advantage of high-performance computing clusters and dispatches jobs in parallel to available CPUs to speed up the analysis. The scclusteval package provides functions to facilitate the analysis of the output, including a series of rich visualizations.AvailabilityR package scclusteval: https://github.com/crazyhottommy/scclusteval Snakemake workflow: https://github.com/crazyhottommy/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

ampir: an R package for fast genome-wide prediction of antimicrobial peptides

10.1101/2020.05.07.082412 ◽

2020 ◽

Author(s):

Legana C.H.W Fingerhut ◽

David J. Miller ◽

Jan M. Strugnell ◽

Norelle L. Daly ◽

Ira R. Cooke

Keyword(s):

Antimicrobial Peptides ◽

Pharmaceutical Research ◽

R Package ◽

Supplementary Information ◽

Link Type ◽

Genome Data ◽

Classification Framework ◽

Genome Wide ◽

Genes Encoding ◽

Feature Calculation

AbstractSummaryAntimicrobial peptides (AMPs) are key components of the innate immune system that protect against pathogens, regulate the microbiome, and are promising targets for pharmaceutical research. Computational tools based on machine learning have the potential to aid discovery of genes encoding novel AMPs but existing approaches are not designed for genome-wide scans. To facilitate such genome-wide discovery of AMPs we developed a fast and accurate AMP classification framework, ampir. ampir is designed for high throughput, integrates well with existing bioinformatics pipelines, and has much higher classification accuracy than existing methods when applied to whole genome data.Availability and Implementationampir is implemented primarily in R with core feature calculation methods written in C++. Release versions are available via CRAN and work on all major operating systems. The development version is maintained at https://github.com/legana/[email protected]; [email protected] informationSupplementary data are available at https://github.com/legana/amp_pub

Download Full-text