scholarly journals Maftools: Efficient analysis, visualization and summarization of MAF files from large-scale cohort based cancer studies

2016 ◽  
Author(s):  
Anand Mayakonda ◽  
H Phillip Koeffler

AbstractMutation Annotation Format (MAF) has become a standard file format for storing somatic/germline variants derived from sequencing of large cohort of cancer samples. MAF files contain a list of all variants detected in a sample along with various annotations associated with the putative variant. MAF file forms the basis for many downstream analyses and provides complete landscape of the cohort. Here we introduce maftools–an R package that provides rich source of functions for performing various analyses, visualizations and summarization of MAF files. Maftools uses data.table library for faster processing/summarization and ggplot2 for generating rich and publication quality visualizations. Maftools also takes advantages of S4 class system for better data representation, with easy to use and flexible functions.Availability and Implementationmaftools is implemented as an R package available at https://github.com/PoisonAlien/[email protected]

Author(s):  
Zachary B Abrams ◽  
Caitlin E Coombes ◽  
Suli Li ◽  
Kevin R Coombes

Abstract Summary Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. Availabilityand implementation Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).


2020 ◽  
Author(s):  
Liya Ming ◽  
Yang Zou ◽  
Yiming Zhao ◽  
Luna Zhang ◽  
Ningning He ◽  
...  

ABSTRACTA large number of post-translational modifications (PTMs) in proteins are buried in the unassigned mass spectrometric (MS) spectra in shot-gun proteomics datasets. Because the modified peptide fragments are low in abundance relative to the corresponding non-modified versions, it is critical to develop tools that allow facile evaluation of assignment of PTMs based on the MS/MS spectra. Such tools would preferably have the ability to allow comparison of fragment ion spectra and retention time between the modified and unmodified peptide pairs or group. Herein, we describe MMS2plot, an R package for visualizing peptide-spectrum matches (PSMs) for multiple peptides. MMS2plot features a batch mode and generates the output images in vector graphics file format that facilitate evaluation and publication of the PSM assignment. We expect MMS2plot to play an important role in PTM discovery from large-scale proteomics datasets generated by LC (liquid chromatography)-MS/MS. The MMS2plot package is freely available at https://github.com/lileir/MMS2plot under the GPL-3 license.


Author(s):  
Virdiansyah Permana ◽  
Rahmat Shoureshi

This study presents a new approach to determine the controllability and observability of a large scale nonlinear dynamic thermal system using graph-theory. The novelty of this method is in adapting graph theory for nonlinear class and establishing a graphic condition that describes the necessary and sufficient terms for a nonlinear class system to be controllable and observable, which equivalents to the analytical method of Lie algebra rank condition. The directed graph (digraph) is utilized to model the system, and the rule of its adaptation in nonlinear class is defined. Subsequently, necessary and sufficient terms to achieve controllability and observability condition are investigated through the structural property of a digraph called connectability. It will be shown that the connectability condition between input and states, as well as output and states of a nonlinear system are equivalent to Lie-algebra rank condition (LARC). This approach has been proven to be easier from a computational point of view and is thus found to be useful when dealing with a large system.


2020 ◽  
Author(s):  
Jenna Marie Reps ◽  
Ross Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Objective: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Materials & Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Discussion: This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. Conclusion : In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


2021 ◽  
Author(s):  
Gastón Mauro Díaz

1) Hemispherical photography (HP) is a long-standing tool for forest canopy characterization. Currently, there are low-cost fisheye lenses to convert smartphones into high-portable HP equipment; however, they cannot be used whenever since HP is sensitive to illumination conditions. To obtain sound results outside diffuse light conditions, a deep-learning-based system needs to be developed. A ready-to-use alternative is the multiscale color-based binarization algorithm, but it can provide moderate-quality results only for open forests. To overcome this limitation, I propose coupling it with the model-based local thresholding algorithm. I call this coupling the MBCB approach. 2) Methods presented here are part of the R package CAnopy IMage ANalysis (caiman), which I am developing. The accuracy assessment of the new MBCB approach was done with data from a pine plantation and a broadleaf native forest. 3) The coefficient of determination (R^2) was greater than 0.7, and the root mean square error (RMSE) lower than 20 %, both for plant area index calculation. 4) Results suggest that the new MBCB approach allows the calculation of unbiased canopy metrics from smartphone-based HP acquired in sunlight conditions, even for closed canopies. This facilitates large-scale and opportunistic sampling with hemispherical photography.


2019 ◽  
Author(s):  
Alvin Vista

Cheating detection is an important issue in standardized testing, especially in large-scale settings. Statistical approaches are often computationally intensive and require specialised software to conduct. We present a two-stage approach that quickly filters suspected groups using statistical testing on an IRT-based answer-copying index. We also present an approach to mitigate data contamination and improve the performance of the index. The computation of the index was implemented through a modified version of an open source R package, thus enabling wider access to the method. Using data from PIRLS 2011 (N=64,232) we conduct a simulation to demonstrate our approach. Type I error was well-controlled and no control group was falsely flagged for cheating, while 16 (combined n=12,569) of the 18 (combined n=14,149) simulated groups were detected. Implications for system-level cheating detection and further improvements of the approach were discussed.


2020 ◽  
Author(s):  
Atilio O. Rausch ◽  
Maria I. Freiberger ◽  
Cesar O. Leonetti ◽  
Diego M. Luna ◽  
Leandro G. Radusky ◽  
...  

Once folded natural protein molecules have few energetic conflicts within their polypeptide chains. Many protein structures do however contain regions where energetic conflicts remain after folding, i.e. they have highly frustrated regions. These regions, kept in place over evolutionary and physiological timescales, are related to several functional aspects of natural proteins such as protein-protein interactions, small ligand recognition, catalytic sites and allostery. Here we present FrustratometeR, an R package that easily computes local energetic frustration on a personal computer or a cluster. This package facilitates large scale analysis of local frustration, point mutants and MD trajectories, allowing straightforward integration of local frustration analysis in to pipelines for protein structural analysis.Availability and implementation: https://github.com/proteinphysiologylab/frustratometeR


Author(s):  
Jenna Marie Reps ◽  
Ross D Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Background: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Conclusion : This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
B .V Binoy ◽  
M. A Naseer ◽  
P.P Anil Kumar ◽  
Nina Lazar

Purpose Real estate valuation studies gained popularity with the availability of large-scale property transaction data in the latter part of the twentieth century. Hedonic price modeling (HPM) was the most popular method in the initial years until it was taken over by advanced modeling methods in the twenty-first century. Even though there exist a few literature reviews on this topic, no comprehensive bibliometric analysis is conducted in this area. In view of gaining a better understanding of the dynamics of property valuation studies, this paper aims to conduct a bibliometric analysis. Design/methodology/approach A comprehensive search in the Scopus database, followed by detailed screening resulted in 1,400 articles. The identified research articles spanning over five decades (1964–2019) are analyzed using the open-source R package “bibliometrix.” Findings The study found the USA to be the most productive country in various aspects, such as number of publications, number of authors and publication hotspots. The findings also demonstrate assessments on the publication trends, journals, citations, keywords, co-citation and collaboration networks. It was observed that there exists an upsurge in the number of publications after the year 2000 owing to improved data availability and better modeling techniques. Research limitations/implications This study is significant in understanding the major research areas and modeling techniques used in property valuation. Future studies can incorporate multiple database sources and include more articles. Originality/value The current study is one of the first bibliometric studies on property valuation. Previous studies have not explored the possibilities of geographic information system in bibliometric research. Spatial mapping and analysis of publications provide a geographical perspective of valuation research.


Sign in / Sign up

Export Citation Format

Share Document