Evaluation of gene–drug common module identification methods using pharmacogenomics data

Author(s):  
Jie Huang ◽  
Jiazhou Chen ◽  
Bin Zhang ◽  
Lei Zhu ◽  
Hongmin Cai

Abstract Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were ‘many-genes-to-many drugs’ interactions, i.e. common modules, opposed to ‘one-gene-to-one-drug’ interactions. Such modules fully explain the interactions between complex biological regulatory mechanisms and cancer drugs. However, strategies for effectively and robustly identifying the underlying common modules among pharmacogenomics data remain to be improved. In this paper, we aim to provide a detailed evaluation of three categories of state-of-the-art common module identification techniques from a machine learning perspective, including non-negative matrix factorization (NMF), partial least squares (PLS) and network analyses. We first evaluate the performance of six methods, namely SNMNMF, NetNMF, SNPLS, O2PLS, NSBM and HOGMMNC, using two series of simulated data sets with different noise levels and outlier ratios. Then, we conduct experiments using a real world data set of 2091 genes and 101 drugs in 392 cancer cell lines and compare the real experimental results from the aspect of biological process term enrichment, gene–drug and drug–drug interactions. Finally, we present interesting findings from our evaluation study and discuss the advantages and drawbacks of each method. Supplementary information: Supplementary file is available at Briefings in Bioinformatics online.

2018 ◽  
Author(s):  
Hanchen Yu ◽  
Stewart Fotheringham ◽  
Ziqi Li ◽  
Taylor Oshan ◽  
Wei Kang ◽  
...  

A recent paper (Fotheringham et al. 2017) expands the well-known Geographically Weighted Regression (GWR) framework significantly by allowing the bandwidth or smoothing factor in GWR to be derived separately for each covariate in the model – a framework referred to as Multiscale GWR (MGWR). However, one limitation of the MGWR framework is that, until now, no inference about the local parameter estimates was possible. Formally, the so-called “hat matrix,” which projects the observed response vector into the predicted response vector, was available in GWR but not in MGWR. This paper addresses this limitation by reframing GWR as a Generalized Additive Model (GAM), extending this framework to MGWR and then deriving standard errors for the local parameters in MGWR. In addition, we also demonstrate how the effective number of parameters (ENP) can be obtained for the overall fit of an MGWR model and for each of the covariates within the model. This statistic is essential for comparing model fit between MGWR, GWR, and traditional global models, as well as adjusting for multiple hypothesis tests. We demonstrate these advances to the MGWR framework with both a simulated data set and a real-world data set.


2014 ◽  
Author(s):  
Graham Jones ◽  
Bengt Oxelman

Motivation: The multispecies coalescent model provides a formal framework for the assignment of individual organisms to species, where the species are modeled as the branches of the species tree. None of the available approaches so far have simultaneously co-estimated all the relevant parameters in the model, without restricting the parameter space by requiring a guide tree and/or prior assignment of individuals to clusters or species. Results: We present DISSECT, which explores the full space of possible clusterings of individuals and species tree topologies in a Bayesian framework. It uses an approximation to avoid the need for reversible-jump MCMC, in the form of a prior that is a modification of the birth-death prior for the species tree. It incorporates a spike near zero in the density for node heights. The model has two extra parameters: one controls the degree of approximation, and the second controls the prior distribution on the numbers of species. It is implemented as part of BEAST and requires only a few changes from a standard *BEAST analysis. The method is evaluated on simulated data and demonstrated on an empirical data set. The method is shown to be insensitive to the degree of approximation, but quite sensitive to the second parameter, suggesting that large numbers of sequences are needed to draw firm conclusions. Availability:http://code.google.com/p/beast-mcmc/, http://www.indriid.com/dissectinbeast.html Contact:[email protected], www.indriid.com Supplementary information: Supplementary material is available.


Author(s):  
Tao Sun ◽  
Mengci Li ◽  
Xiangtian Yu ◽  
Dandan Liang ◽  
Guoxiang Xie ◽  
...  

Abstract Motivation The metabolome and microbiome disorders are highly associated with human health and there are great demands for dual-omics interaction analysis. Here, we designed and developed an integrative platform, 3MCor, for metabolome and microbiome correlation analysis under the instruction of phenotype and with the consideration of confounders. Results Many traditional and novel correlation analysis methods were integrated for intra- and inter-correlation analysis. Three inter-correlation pipelines are provided for global, hierarchical, and pairwise analysis. The incorporated network analysis function is conducive to rapid identification of network clusters and key nodes from a complicated correlation network. Complete numerical results (csv files) and rich figures (pdf files) will be generated in minutes. To our knowledge, 3MCor is the first platform developed specifically for the correlation analysis of metabolome and microbiome. Its functions were compared with corresponding modules of existing omics data analysis platforms. A real-world data set was used to demonstrate its simple and flexible operation, comprehensive outputs, and distinctive contribution to dual-omics studies. Availability 3MCor is available at http://3mcor.cn and the backend R script is available at https://github.com/chentianlu/3MCorServer. Supplementary information Supplementary data are available at Bioinformatics online.


Genes ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 25
Author(s):  
He-Gang Chen ◽  
Xiong-Hui Zhou

Drug repurposing/repositioning, which aims to find novel indications for existing drugs, contributes to reducing the time and cost for drug development. For the recent decade, gene expression profiles of drug stimulating samples have been successfully used in drug repurposing. However, most of the existing methods neglect the gene modules and the interactions among the modules, although the cross-talks among pathways are common in drug response. It is essential to develop a method that utilizes the cross-talks information to predict the reliable candidate associations. In this study, we developed MNBDR (Module Network Based Drug Repositioning), a novel method that based on module network to screen drugs. It integrated protein–protein interactions and gene expression profile of human, to predict drug candidates for diseases. Specifically, the MNBDR mined dense modules through protein–protein interaction (PPI) network and constructed a module network to reveal cross-talks among modules. Then, together with the module network, based on existing gene expression data set of drug stimulation samples and disease samples, we used random walk algorithms to capture essential modules in disease development and proposed a new indicator to screen potential drugs for a given disease. Results showed MNBDR could provide better performance than popular methods. Moreover, functional analysis of the essential modules in the network indicated our method could reveal biological mechanism in drug response.


Author(s):  
M D MacNeil ◽  
J W Buchanan ◽  
M L Spangler ◽  
E Hay

Abstract The objective of this study was to evaluate the effects of various data structures on the genetic evaluation for the binary phenotype of reproductive success. The data were simulated based on an existing pedigree and an underlying fertility phenotype with a heritability of 0.10. A data set of complete observations was generated for all cows. This data set was then modified mimicking the culling of cows when they first failed to reproduce, cows having a missing observation at either their second or fifth opportunity to reproduce as if they had been selected as donors for embryo transfer, and censoring records following the sixth opportunity to reproduce as in a cull-for-age strategy. The data were analyzed using a third order polynomial random regression model. The EBV of interest for each animal was the sum of the age-specific EBV over the first 10 observations (reproductive success at ages 2-11). Thus, the EBV might be interpreted as the genetic expectation of number of calves produced when a female is given ten opportunities to calve. Culling open cows resulted in the EBV for 3 year-old cows being reduced from 8.27 ± 0.03 when open cows were retained to 7.60 ± 0.02 when they were culled. The magnitude of this effect decreased as cows grew older when they first failed to reproduce and were subsequently culled. Cows that did not fail over the 11 years of simulated data had an EBV of 9.43 ± 0.01 and 9.35 ± 0.01 based on analyses of the complete data and the data in which cows that failed to reproduce were culled, respectively. Cows that had a missing observation for their second record had a significantly reduced EBV, but the corresponding effect at the fifth record was negligible. The current study illustrates that culling and management decisions, and particularly those that impact the beginning of the trajectory of sustained reproductive success, can influence both the magnitude and accuracy of resulting EBV.


2021 ◽  
Vol 4 (1) ◽  
pp. 251524592095492
Author(s):  
Marco Del Giudice ◽  
Steven W. Gangestad

Decisions made by researchers while analyzing data (e.g., how to measure variables, how to handle outliers) are sometimes arbitrary, without an objective justification for choosing one alternative over another. Multiverse-style methods (e.g., specification curve, vibration of effects) estimate an effect across an entire set of possible specifications to expose the impact of hidden degrees of freedom and/or obtain robust, less biased estimates of the effect of interest. However, if specifications are not truly arbitrary, multiverse-style analyses can produce misleading results, potentially hiding meaningful effects within a mass of poorly justified alternatives. So far, a key question has received scant attention: How does one decide whether alternatives are arbitrary? We offer a framework and conceptual tools for doing so. We discuss three kinds of a priori nonequivalence among alternatives—measurement nonequivalence, effect nonequivalence, and power/precision nonequivalence. The criteria we review lead to three decision scenarios: Type E decisions (principled equivalence), Type N decisions (principled nonequivalence), and Type U decisions (uncertainty). In uncertain scenarios, multiverse-style analysis should be conducted in a deliberately exploratory fashion. The framework is discussed with reference to published examples and illustrated with the help of a simulated data set. Our framework will help researchers reap the benefits of multiverse-style methods while avoiding their pitfalls.


Author(s):  
Mrugank Bhaskarkumar Parmar ◽  
Shital Panchal

This study for drug repositioning has been performed for the drugs which are in the market since more than a decade and they are approved with their well-established efficacy and safety in human being. Objective of this study was to reposition the existing non-cancer drug therapy for cancer treatment, which is having well characterized pharmacologic profile with more efficacy and least toxicity as anti-neoplastic agent. We have retrieved the source data from FDA Adverse Event Reporting System (FAERS) for the last 13 years covering duration from 2004 to 2016 and analysed those using pharmacovigilance approach ‘a proposed future novel pharmaceutical tool for drug reposition’. Signal management activity was performed for statistical analysis. Result of statistical analysis derived that propranolol; metformin; pioglitazone; dabigatran and nitroglycerin are the existing non-cancer drugs which deserved for their direct / indirect reposition for cancer treatment and anti-neoplastic activity. Further studies retrieving the source data from other regulatory database (e.g. Eudravigilance of EMA and VigiFlow of WHO) and post-marketing surveillance study with the same objective may adjuvant our results for the reposition of existing drugs by pharmacovigilance approach.


Author(s):  
Marcelo N. de Sousa ◽  
Ricardo Sant’Ana ◽  
Rigel P. Fernandes ◽  
Julio Cesar Duarte ◽  
José A. Apolinário ◽  
...  

AbstractIn outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate’s performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm.


2021 ◽  
pp. 1-13
Author(s):  
Hailin Liu ◽  
Fangqing Gu ◽  
Zixian Lin

Transfer learning methods exploit similarities between different datasets to improve the performance of the target task by transferring knowledge from source tasks to the target task. “What to transfer” is a main research issue in transfer learning. The existing transfer learning method generally needs to acquire the shared parameters by integrating human knowledge. However, in many real applications, an understanding of which parameters can be shared is unknown beforehand. Transfer learning model is essentially a special multi-objective optimization problem. Consequently, this paper proposes a novel auto-sharing parameter technique for transfer learning based on multi-objective optimization and solves the optimization problem by using a multi-swarm particle swarm optimizer. Each task objective is simultaneously optimized by a sub-swarm. The current best particle from the sub-swarm of the target task is used to guide the search of particles of the source tasks and vice versa. The target task and source task are jointly solved by sharing the information of the best particle, which works as an inductive bias. Experiments are carried out to evaluate the proposed algorithm on several synthetic data sets and two real-world data sets of a school data set and a landmine data set, which show that the proposed algorithm is effective.


Sign in / Sign up

Export Citation Format

Share Document