scholarly journals The Structural Information Filtered Features Potential for Machine Learning calculations of energies and forces of atomic systems.

2019 ◽  
Author(s):  
Jorge Arturo Hernandez Zeledon
2021 ◽  
Vol 108 (Supplement_3) ◽  
Author(s):  
J Bote ◽  
J F Ortega-Morán ◽  
C L Saratxaga ◽  
B Pagador ◽  
A Picón ◽  
...  

Abstract INTRODUCTION New non-invasive technologies for improving early diagnosis of colorectal cancer (CRC) are demanded by clinicians. Optical Coherence Tomography (OCT) provides sub-surface structural information and offers diagnosis capabilities of colon polyps, further improved by machine learning methods. Databases of OCT images are necessary to facilitate algorithms development and testing. MATERIALS AND METHODS A database has been acquired from rat colonic samples with a Thorlabs OCT system with 930nm centre wavelength that provides 1.2KHz A-scan rate, 7μm axial resolution in air, 4μm lateral resolution, 1.7mm imaging depth in air, 6mm x 6mm FOV, and 107dB sensitivity. The colon from anaesthetised animals has been excised and samples have been extracted and preserved for ex-vivo analysis with the OCT equipment. RESULTS This database consists of OCT 3D volumes (C-scans) and 2D images (B-scans) of murine samples from: 1) healthy tissue, for ground-truth comparison (18 samples; 66 C-scans; 17,478 B-scans); 2) hyperplastic polyps, obtained from an induced colorectal hyperplastic murine model (47 samples; 153 C-scans; 42,450 B-scans); 3) neoplastic polyps (adenomatous and adenocarcinomatous), obtained from clinically validated Pirc F344/NTac-Apcam1137 rat model (232 samples; 564 C-scans; 158,557 B-scans); and 4) unknown tissue (polyp adjacent, presumably healthy) (98 samples; 157 C-scans; 42,070 B-scans). CONCLUSIONS A novel extensive ex-vivo OCT database of murine CRC model has been obtained and will be openly published for the research community. It can be used for classification/segmentation machine learning methods, for correlation between OCT features and histopathological structures, and for developing new non-invasive in-situ methods of diagnosis of colorectal cancer.


2021 ◽  
Author(s):  
Philippe Auguste Robert ◽  
Rahmad Akbar ◽  
Robert Frank ◽  
Milena Pavlović ◽  
Michael Widrich ◽  
...  

Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10381
Author(s):  
Rohit Nandakumar ◽  
Valentin Dinu

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.


2019 ◽  
Author(s):  
Ge Liu ◽  
Haoyang Zeng ◽  
Jonas Mueller ◽  
Brandon Carter ◽  
Ziheng Wang ◽  
...  

AbstractThe precise targeting of antibodies and other protein therapeutics is required for their proper function and the elimination of deleterious off-target effects. Often the molecular structure of a therapeutic target is unknown and randomized methods are used to design antibodies without a model that relates antibody sequence to desired properties. Here we present a machine learning method that can design human Immunoglobulin G (IgG) antibodies with target affinities that are superior to candidates from phage display panning experiments within a limited design budget. We also demonstrate that machine learning can improve target-specificity by the modular composition of models from different experimental campaigns, enabling a new integrative approach to improving target specificity. Our results suggest a new path for the discovery of therapeutic molecules by demonstrating that predictive and differentiable models of antibody binding can be learned from high-throughput experimental data without the need for target structural data.SignificanceAntibody based therapeutics must meet both affinity and specificity metrics, and existing in vitro methods for meeting these metrics are based upon randomization and empirical testing. We demonstrate that with sufficient target-specific training data machine learning can suggest novel antibody variable domain sequences that are superior to those observed during training. Our machine learning method does not require any target structural information. We further show that data from disparate antibody campaigns can be combined by machine learning to improve antibody specificity.


2018 ◽  
Vol 11 (1) ◽  
pp. 34
Author(s):  
Alfan Farizki Wicaksono ◽  
Sharon Raissa Herdiyana ◽  
Mirna Adriani

Someone's understanding and stance on a particular controversial topic can be influenced by daily news or articles he consume everyday. Unfortunately, readers usually do not realize that they are reading controversial articles. In this paper, we address the problem of automatically detecting controversial article from citizen journalism media. To solve the problem, we employ a supervised machine learning approach with several hand-crafted features that exploits linguistic information, meta-data of an article, structural information in the commentary section, and sentiment expressed inside the body of an article. The experimental results shows that our proposed method manages to perform the addressed task effectively. The best performance so far is achieved when we use all proposed feature with Logistic Regression as our model (82.89\% in terms of accuracy). Moreover, we found that information from commentary section (structural features) contributes most to the classification task.


2021 ◽  
Author(s):  
Haibin Di ◽  
Chakib Kada Kloucha ◽  
Cen Li ◽  
Aria Abubakar ◽  
Zhun Li ◽  
...  

Abstract Delineating seismic stratigraphic features and depositional facies is of importance to successful reservoir mapping and identification in the subsurface. Robust seismic stratigraphy interpretation is confronted with two major challenges. The first one is to maximally automate the process particularly with the increasing size of seismic data and complexity of target stratigraphies, while the second challenge is to efficiently incorporate available structures into stratigraphy model building. Machine learning, particularly convolutional neural network (CNN), has been introduced into assisting seismic stratigraphy interpretation through supervised learning. However, the small amount of available expert labels greatly restricts the performance of such supervised CNN. Moreover, most of the exiting CNN implementations are based on only amplitude, which fails to use necessary structural information such as faults for constraining the machine learning. To resolve both challenges, this paper presents a semi-supervised learning workflow for fault-guided seismic stratigraphy interpretation, which consists of two components. The first component is seismic feature engineering (SFE), which aims at learning the provided seismic and fault data through a unsupervised convolutional autoencoder (CAE), while the second one is stratigraphy model building (SMB), which aims at building an optimal mapping function between the features extracted from the SFE CAE and the target stratigraphic labels provided by an experienced interpreter through a supervised CNN. Both components are connected by embedding the encoder of the SFE CAE into the SMB CNN, which forces the SMB learning based on these features commonly existing in the entire study area instead of those only at the limited training data; correspondingly, the risk of overfitting is greatly eliminated. More innovatively, the fault constraint is introduced by customizing the SMB CNN of two output branches, with one to match the target stratigraphies and the other to reconstruct the input fault, so that the fault continues contributing to the process of SMB learning. The performance of such fault-guided seismic stratigraphy interpretation is validated by an application to a real seismic dataset, and the machine prediction not only matches the manual interpretation accurately but also clearly illustrates the depositional process in the study area.


2020 ◽  
Author(s):  
Sheng Ye ◽  
Guozhen Zhang ◽  
Jun Jiang

<div> <p>Here we demonstrate by a proof-of-concept simulation of IR spectra of complex of spike protein of SARS-CoV-2 and human ACE2, that a time-resolved spectroscopy may monitor the real-time structural information of the protein-protein complexes of interest, with the help of a machine learning protocol. The significant speedup of our approach relative to conventional quantum chemistry approach suggests a promising way of accelerating the development of real-time spectroscopy study of protein dynamics.</p> </div>


Author(s):  
Stefan Windmann ◽  
Christian Kühnert

AbstractIn this paper, a new information model for machine learning applications is introduced, which allows for a consistent acquisition and semantic annotation of process data, structural information and domain knowledge from industrial productions systems. The proposed information model is based on Industry 4.0 components and IEC 61360 component descriptions. To model sensor data, components of the OGC SensorThings model such as data streams and observations have been incorporated in this approach. Machine learning models can be integrated into the information model in terms of existing model serving frameworks like PMML or Tensorflowgraph. Based on the proposed information model, a tool chain for automatic knowledge extraction is introduced and the automatic classification of unstructured text is investigated as a particular application case for the proposed tool chain.


2020 ◽  
Author(s):  
Sheng Ye ◽  
Guozhen Zhang ◽  
Jun Jiang

<div> <p>Here we demonstrate by a proof-of-concept simulation of IR spectra of complex of spike protein of SARS-CoV-2 and human ACE2, that a time-resolved spectroscopy may monitor the real-time structural information of the protein-protein complexes of interest, with the help of a machine learning protocol. The significant speedup of our approach relative to conventional quantum chemistry approach suggests a promising way of accelerating the development of real-time spectroscopy study of protein dynamics.</p> </div>


2020 ◽  
Vol 127 (21) ◽  
pp. 215108
Author(s):  
Jorge Arturo Hernandez Zeledon ◽  
Aldo H. Romero ◽  
Pengju Ren ◽  
Xiaodong Wen ◽  
Yongwang Li ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document