scholarly journals Design of the TRONCO BioConductor Package for TRanslational ONCOlogy

2015 ◽  
Author(s):  
Marco Antoniotti ◽  
Giulio Caravagna ◽  
Luca De Sano ◽  
Alex Graudenzi ◽  
Giancarlo Mauri ◽  
...  

Models of cancer progression provide insights on the order of accumulation of genetic alterations during cancer development. Algorithms to infer such models from the currently available mutational profiles collected from different cancer patiens (cross-sectional data) have been defined in the literature since late 90s. These algorithms differ in the way they extract a graphical model of the events modelling the progression, e.g., somatic mutations or copy-number alterations. TRONCO is an R package for TRanslational ONcology which provides a serie of functions to assist the user in the analysis of cross sectional genomic data and, in particular, it implements algorithms that aim to model cancer progression by means of the notion of selective advantage. These algorithms are proved to outperform the current state-of-the-art in the inference of cancer progression models. TRONCO also provides functionalities to load input cross-sectional data, set up the execution of the algorithms, assess the statistical confidence in the results and visualize the models. Availability. Freely available at http://www.bioconductor.org/ under GPL license; project hosted at http://bimib.disco.unimib.it/ and https://github.com/BIMIB-DISCo/TRONCO. Contact. [email protected]

2015 ◽  
Author(s):  
Giulio Caravagna ◽  
Alex Graudenzi ◽  
DANIELE RAMAZZOTTI ◽  
Rebeca Sanz-Pamplona ◽  
Luca De Sano ◽  
...  

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next generation sequencing (NGS) data, and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent works on "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications as it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses.


2016 ◽  
Vol 113 (28) ◽  
pp. E4025-E4034 ◽  
Author(s):  
Giulio Caravagna ◽  
Alex Graudenzi ◽  
Daniele Ramazzotti ◽  
Rebeca Sanz-Pamplona ◽  
Luca De Sano ◽  
...  

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.


2017 ◽  
Author(s):  
Ramon Diaz-Uriarte

AbstractThe identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to try to identify these constraints, and return Directed Acyclic Graphs (DAGs) of genes. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes —e.g., those with reciprocal sign epistasis— cannot be represented by CPMs. Using simulated data under 500 fitness landscapes, I show that CPMs’ performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime, and fitness landscape features, in ways that depend on CPM method. And the same DAG is often observed in very different landscapes, which differ in more than 50% of their accessible genotypes. Using a pancreatic data set, I show that this many-to-many relationship affects the analysis of empirical data. Fitness landscapes that are widely different from each other can, when evolutionary processes run repeatedly on them, both produce data similar to the empirically observed one, and lead to DAGs that are very different among themselves. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs.


Author(s):  
Jonas M. B. Haslbeck

AbstractStatistical network models such as the Gaussian Graphical Model and the Ising model have become popular tools to analyze multivariate psychological datasets. In many applications, the goal is to compare such network models across groups. In this paper, I introduce a method to estimate group differences in network models that is based on moderation analysis. This method is attractive because it allows one to make comparisons across more than two groups for all parameters within a single model and because it is implemented for all commonly used cross-sectional network models. Next to introducing the method, I evaluate the performance of the proposed method and existing approaches in a simulation study. Finally, I provide a fully reproducible tutorial on how to use the proposed method to compare a network model across three groups using the R-package mgm.


2020 ◽  
Author(s):  
Jonas M B Haslbeck

Statistical network models such as the Gaussian Graphical Model and the Ising model have become popular tools to analyze multivariate psychological data sets. In many applications the goal is to compare such network models across groups. In this paper I introduce a method to estimate differences in network models across groups that is based on moderation analysis. This method is attractive because it allows to make comparisons across more than two groups within a single model, and because it is implemented for all commonly used cross-sectional network models. Next to introducing the method, I evaluate the performance of the proposed method and existing approaches in a simulation study. Finally, I provide a fully reproducible tutorial on how to use the moderation method to compare a network model across three groups using the R-package mgm.


Author(s):  
Warsame Yusuf ◽  
Rostyslav Vyuha ◽  
Carol Bennett ◽  
Yulric Sequeira ◽  
Courtney Maskerine ◽  
...  

Abstract Setting The Canadian Community Health Survey (CCHS) is one of the world’s largest ongoing cross-sectional population health surveys, with over 130,000 respondents every two years or over 1.1 million respondents since its inception in 2001. While the survey remains relatively consistent over the years, there are differences between cycles that pose a challenge to analyze the survey over time. Intervention A program package called cchsflow was developed to transform and harmonize CCHS variables to consistent formats across multiple survey cycles. An open science approach was used to maintain transparency, reproducibility and collaboration. Outcomes The cchsflow R package uses CCHS survey data between 2001 and 2014. Worksheets were created that identify variables, their names in previous cycles, their category structure, and their final variable names. These worksheets were then used to recode variables in each CCHS cycle into consistently named and labelled variables. Following, survey cycles can be combined. The package was then added as a GitHub repository to encourage collaboration with other researchers. Implication The cchsflow package has been added to the Comprehensive R Archive Network (CRAN) and contains support for over 160 CCHS variables, generating a combined data set of over 1 million respondents. By implementing open science practices, cchsflow aims to minimize the amount of time needed to clean and prepare data for the many CCHS users across Canada.


2018 ◽  
Author(s):  
Ramon Diaz-Uriarte ◽  
Claudia Vasallo

AbstractSuccessful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true un-predictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.Author SummaryKnowing the likely paths of tumor progression is instrumental for cancer precision medicine as it would allow us to identify genetic targets that block disease progression and to improve therapeutic decisions. Direct information about paths of tumor progression is scarce, but cancer progression models (CPMs), which use as input cross-sectional data on genetic alterations, can be used to predict these paths. CPMs, however, make assumptions about fitness landscapes (genotype-fitness maps) that might not be met in cancer. We examine if four CPMs can be used to predict successfully the distribution of tumor progression paths; we find that some CPMs work well when sample sizes are large and fitness landscapes have a single fitness maximum, but in fitness landscapes with multiple fitness maxima prediction is poor. However, the best performing CPM in our study could be used to estimate evolutionary unpredictability. When we apply the best performing CPM in our study to twenty-two cancer data sets we find that predictions are generally unreliable but that some cancer data sets show low unpredictability. Our results highlight that CPMs could be valuable tools for predicting disease progression, but emphasize the need for methodological work to account for multi-peaked fitness landscapes.


2020 ◽  
Author(s):  
Phillip B. Nicol ◽  
Kevin R. Coombes ◽  
Courtney Deaver ◽  
Oksana A. Chkrebtii ◽  
Subhadeep Paul ◽  
...  

ABSTRACTCancer is the process of accumulating genetic alterations that confer selective advantages to tumor cells. The order in which aberrations occur is not arbitrary, and inferring the order of events is a challenging problem due to the lack of longitudinal samples from tumors. Moreover, a network model of oncogenesis should capture biological facts such as distinct progression trajectories of cancer subtypes and patterns of mutual exclusivity of alterations in the same pathways. In this paper, we present the Disjunctive Bayesian Network (DBN), a novel cancer progression model. Unlike previous models of oncogenesis, DBN naturally captures mutually exclusive alterations. Besides, DBN is flexible enough to represent progression trajectories of cancer subtypes, therefore allowing one to learn the progression network from unstratified data, i.e., mixed samples from multiple subtypes. We provide a scalable genetic algorithm to learn the structure of DBN from cross-sectional cancer data. To test our model, we simulate synthetic data from known progression networks and show that our algorithm infers the ground truth network with high accuracy. Finally, we apply our model to copy number data for colon cancer and mutation data for bladder cancer and observe that the recovered progression network matches known biological facts.


2015 ◽  
Author(s):  
Luca De Sano ◽  
Giulio Caravagna ◽  
Daniele Ramazzotti ◽  
Alex Graudenzi ◽  
Giancarlo Mauri ◽  
...  

AbstractMotivationWe introduce TRONCO (TRanslational ONCOlogy), an open-source R package that implements the state-of-the-art algorithms for the inference of cancer progression models from (epi)genomic mutational profiles. TRONCO can be used to extract population-level models describing the trends of accumulation of alterations in a cohort of cross-sectional samples, e.g., retrieved from publicly available databases, and individual-level models that reveal the clonal evolutionary history in single cancer patients, when multiple samples, e.g., multiple biopsies or single-cell sequencing data, are available. The resulting models can provide key hints in uncovering the evolutionary trajectories of cancer, especially for precision medicine or personalized therapy.AvailabilityTRONCO is released under the GPL license, it is hosted in the Software section at http://bimib.disco.unimib.it/ and archived also at [email protected]


Sign in / Sign up

Export Citation Format

Share Document