Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models

Mapping Intimacies ◽

10.1101/131367 ◽

2017 ◽

Cited By ~ 1

Author(s):

Safoora Yousefi ◽

Fatemeh Amrollahi ◽

Mohamed Amgad ◽

Coco Dong ◽

Joshua E. Lewis ◽

...

Keyword(s):

Clinical Outcomes ◽

Open Source Software ◽

Large Scale ◽

Prediction Models ◽

Genomic Medicine ◽

Optimization Methods ◽

Bayesian Optimization ◽

Survival Models ◽

High Dimensional ◽

Prognostic Accuracy

ABSTRACTTranslating the vast data generated by genomic platforms into accurate predictions of clinical outcomes is a fundamental challenge in genomic medicine. Many prediction methods face limitations in learning from the high-dimensional profiles generated by these platforms, and rely on experts to hand-select a small number of features for training prediction models. In this paper, we demonstrate how deep learning and Bayesian optimization methods that have been remarkably successful in general high-dimensional prediction tasks can be adapted to the problem of predicting cancer outcomes. We perform an extensive comparison of Bayesian optimized deep survival models and other state of the art machine learning methods for survival analysis, and describe a framework for interpreting deep survival models using a risk backpropagation technique. Finally, we illustrate that deep survival models can successfully transfer information across diseases to improve prognostic accuracy. We provide an open-source software implementation of this framework called SurvivalNet that enables automatic training, evaluation and interpretation of deep survival models.

Download Full-text

Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models

Scientific Reports ◽

10.1038/s41598-017-11817-6 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 68

Author(s):

Safoora Yousefi ◽

Fatemeh Amrollahi ◽

Mohamed Amgad ◽

Chengliang Dong ◽

Joshua E. Lewis ◽

...

Keyword(s):

Clinical Outcomes ◽

Large Scale ◽

Survival Models

Download Full-text

Learning manipulation skills from a single demonstration

The International Journal of Robotics Research ◽

10.1177/0278364917743795 ◽

2017 ◽

Vol 37 (1) ◽

pp. 137-154 ◽

Cited By ~ 6

Author(s):

Peter Englert ◽

Marc Toussaint

Keyword(s):

State Of The Art ◽

Optimization Methods ◽

Black Box ◽

Bayesian Optimization ◽

High Dimensional ◽

Control Methods ◽

Benchmark Experiment ◽

Real Robot ◽

Optimal Control Methods ◽

Single Demonstration

We consider the scenario where a robot is demonstrated a manipulation skill once and should then use only a few trials on its own to learn to reproduce, optimize, and generalize that same skill. A manipulation skill is generally a high-dimensional policy. To achieve the desired sample efficiency, we need to exploit the inherent structure in this problem. With our approach, we propose to decompose the problem into analytically known objectives, such as motion smoothness, and black-box objectives, such as trial success or reward, depending on the interaction with the environment. The decomposition allows us to leverage and combine (i) constrained optimization methods to address analytic objectives, (ii) constrained Bayesian optimization to explore black-box objectives, and (iii) inverse optimal control methods to eventually extract a generalizable skill representation. The algorithm is evaluated on a synthetic benchmark experiment and compared with state-of-the-art learning methods. We also demonstrate the performance on real-robot experiments with a PR2.

Download Full-text

Scalable3-BO: Big Data Meets HPC - A Scalable Asynchronous Parallel High-Dimensional Bayesian Optimization Framework on Supercomputers

10.1115/detc2021-70828 ◽

2021 ◽

Author(s):

Anh Tran

Keyword(s):

Big Data ◽

Optimization Methods ◽

Global Optimum ◽

Bayesian Optimization ◽

Sequential Optimization ◽

High Dimensional ◽

Computational Resource ◽

Data Points ◽

Computationally Expensive ◽

Computational Budget

Abstract Bayesian optimization (BO) is a flexible and powerful framework that is suitable for computationally expensive simulation-based applications and guarantees statistical convergence to the global optimum. While remaining as one of the most popular optimization methods, its capability is hindered by the size of data, the dimensionality of the considered problem, and the nature of sequential optimization. These scalability issues are intertwined with each other and must be tackled simultaneously. In this work, we propose the Scalable3-BO framework, which employs sparse GP as the underlying surrogate model to scope with Big Data and is equipped with a random embedding to efficiently optimize high-dimensional problems with low effective dimensionality. The Scalable3-BO framework is further leveraged with asynchronous parallelization feature, which fully exploits the computational resource on HPC within a computational budget. As a result, the proposed Scalable3-BO framework is scalable in three independent perspectives: with respect to data size, dimensionality, and computational resource on HPC. The goal of this work is to push the frontiers of BO beyond its well-known scalability issues and minimize the wall-clock waiting time for optimizing high-dimensional computationally expensive applications. We demonstrate the capability of Scalable3-BO with 1 million data points, 10,000-dimensional problems, with 20 concurrent workers in an HPC environment.

Download Full-text

A Fast Clustering Algorithm for Large-scale and High Dimensional Data

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.00859 ◽

2009 ◽

Vol 35 (7) ◽

pp. 859-866

Author(s):

Ming LIU ◽

Xiao-Long WANG ◽

Yuan-Chao LIU

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Current Pharmaceutical Design ◽

10.2174/1381612826666200427111309 ◽

2020 ◽

Vol 26 (33) ◽

pp. 4195-4205

Author(s):

Xiaoyu Ding ◽

Chen Cui ◽

Dingyan Wang ◽

Jihui Zhao ◽

Mingyue Zheng ◽

...

Keyword(s):

Prediction Model ◽

Large Scale ◽

Prediction Models ◽

Predictive Accuracy ◽

Lead Optimization ◽

Consensus Method ◽

Molecular Pair ◽

Bioactivity Prediction ◽

Compound Synthesis ◽

Consensus Modeling

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

Download Full-text

‘CTRL’: an online, Dynamic Consent and participant engagement platform working towards solving the complexities of consent in genomic research

European Journal of Human Genetics ◽

10.1038/s41431-020-00782-w ◽

2021 ◽

Author(s):

Matilda A. Haas ◽

Harriet Teare ◽

Megan Prictor ◽

Gabi Ceregra ◽

Miranda E. Vidgen ◽

...

Keyword(s):

Large Scale ◽

Genomic Medicine ◽

International Standards ◽

Security And Privacy ◽

Genomic Research ◽

Return Of Results ◽

Future Research ◽

Design And Development ◽

Participant Engagement ◽

Dynamic Consent

AbstractThe complexities of the informed consent process for participating in research in genomic medicine are well-documented. Inspired by the potential for Dynamic Consent to increase participant choice and autonomy in decision-making, as well as the opportunities for ongoing participant engagement it affords, we wanted to trial Dynamic Consent and to do so developed our own web-based application (web app) called CTRL (control). This paper documents the design and development of CTRL, for use in the Australian Genomics study: a health services research project building evidence to inform the integration of genomic medicine into mainstream healthcare. Australian Genomics brought together a multi-disciplinary team to develop CTRL. The design and development process considered user experience; security and privacy; the application of international standards in data sharing; IT, operational and ethical issues. The CTRL tool is now being offered to participants in the study, who can use CTRL to keep personal and contact details up to date; make consent choices (including indicate preferences for return of results and future research use of biological samples, genomic and health data); follow their progress through the study; complete surveys, contact the researchers and access study news and information. While there are remaining challenges to implementing Dynamic Consent in genomic research, this study demonstrates the feasibility of building such a tool, and its ongoing use will provide evidence about the value of Dynamic Consent in large-scale genomic research programs.

Download Full-text

PRISMS-Fatigue computational framework for fatigue analysis in polycrystalline metals and alloys

npj Computational Materials ◽

10.1038/s41524-021-00506-8 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Mohammadreza Yaghoobi ◽

Krzysztof S. Stopka ◽

Aaditya Lakshmanan ◽

Veera Sundararaghavan ◽

John E. Allison ◽

...

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Metals And Alloys ◽

Analysis Tool ◽

Computational Framework ◽

Crystal Plasticity Finite Element ◽

Polycrystalline Metals ◽

Simulation Based ◽

Open Source Framework

AbstractThe PRISMS-Fatigue open-source framework for simulation-based analysis of microstructural influences on fatigue resistance for polycrystalline metals and alloys is presented here. The framework uses the crystal plasticity finite element method as its microstructure analysis tool and provides a highly efficient, scalable, flexible, and easy-to-use ICME community platform. The PRISMS-Fatigue framework is linked to different open-source software to instantiate microstructures, compute the material response, and assess fatigue indicator parameters. The performance of PRISMS-Fatigue is benchmarked against a similar framework implemented using ABAQUS. Results indicate that the multilevel parallelism scheme of PRISMS-Fatigue is more efficient and scalable than ABAQUS for large-scale fatigue simulations. The performance and flexibility of this framework is demonstrated with various examples that assess the driving force for fatigue crack formation of microstructures with different crystallographic textures, grain morphologies, and grain numbers, and under different multiaxial strain states, strain magnitudes, and boundary conditions.

Download Full-text

Influence of Wind Power on Modeling of Bidding Strategy in a Promising Power Market with a Modified Gravitational Search Algorithm

Applied Sciences ◽

10.3390/app11104438 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4438

Author(s):

Satyendra Singh ◽

Manoj Fozdar ◽

Hasmat Malik ◽

Maria del Valle Fernández Moreno ◽

Fausto Pedro García Márquez

Keyword(s):

Wind Power ◽

Electricity Market ◽

Large Scale ◽

Search Algorithm ◽

Gravitational Search Algorithm ◽

Optimization Methods ◽

Bidding Strategy ◽

Wind Speed Profile ◽

Gravitational Search ◽

Modified Gravitational Search Algorithm

It is expected that large-scale producers of wind energy will become dominant players in the future electricity market. However, wind power output is irregular in nature and it is subjected to numerous fluctuations. Due to the effect on the production of wind power, producing a detailed bidding strategy is becoming more complicated in the industry. Therefore, in view of these uncertainties, a competitive bidding approach in a pool-based day-ahead energy marketplace is formulated in this paper for traditional generation with wind power utilities. The profit of the generating utility is optimized by the modified gravitational search algorithm, and the Weibull distribution function is employed to represent the stochastic properties of wind speed profile. The method proposed is being investigated and simplified for the IEEE-30 and IEEE-57 frameworks. The results were compared with the results obtained with other optimization methods to validate the approach.

Download Full-text

Cerebrospinal fluid metabolomics identifies 19 brain-related phenotype associations

Communications Biology ◽

10.1038/s42003-020-01583-z ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Daniel J. Panyard ◽

Kyeong Mo Kim ◽

Burcu F. Darst ◽

Yuetiva K. Deming ◽

Xiaoyuan Zhong ◽

...

Keyword(s):

Cerebrospinal Fluid ◽

Drug Targets ◽

Large Scale ◽

Prediction Models ◽

Genome Wide Association ◽

Large Samples ◽

Genome Wide ◽

Metabolomic Data ◽

Related Phenotype ◽

Omic Data

AbstractThe study of metabolomics and disease has enabled the discovery of new risk factors, diagnostic markers, and drug targets. For neurological and psychiatric phenotypes, the cerebrospinal fluid (CSF) is of particular importance. However, the CSF metabolome is difficult to study on a large scale due to the relative complexity of the procedure needed to collect the fluid. Here, we present a metabolome-wide association study (MWAS), which uses genetic and metabolomic data to impute metabolites into large samples with genome-wide association summary statistics. We conduct a metabolome-wide, genome-wide association analysis with 338 CSF metabolites, identifying 16 genotype-metabolite associations (metabolite quantitative trait loci, or mQTLs). We then build prediction models for all available CSF metabolites and test for associations with 27 neurological and psychiatric phenotypes, identifying 19 significant CSF metabolite-phenotype associations. Our results demonstrate the feasibility of MWAS to study omic data in scarce sample types.

Download Full-text

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Applied Sciences ◽

10.3390/app11020472 ◽

2021 ◽

Vol 11 (2) ◽

pp. 472

Author(s):

Hyeongmin Cho ◽

Sangkyun Lee

Keyword(s):

Machine Learning ◽

Data Quality ◽

Large Scale ◽

High Dimensional Data ◽

Quality Measures ◽

Training Data ◽

Measure Data ◽

High Dimensional ◽

Small Scale ◽

Class Separability

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text