The Structural Information Filtered Features Potential for Machine Learning calculations of energies and forces of atomic systems.

A NOVEL EXTENSIVE EX-VIVO OCT DATABASE FROM MURINE MODELS OF COLORECTAL CANCER

British Journal of Surgery ◽

10.1093/bjs/znab160.030 ◽

2021 ◽

Vol 108 (Supplement_3) ◽

Author(s):

J Bote ◽

J F Ortega-Morán ◽

C L Saratxaga ◽

B Pagador ◽

A Picón ◽

...

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Structural Information ◽

Ex Vivo ◽

Ground Truth ◽

Colon Polyps ◽

Learning Methods ◽

Non Invasive ◽

Machine Learning Methods ◽

In Situ Methods

Abstract INTRODUCTION New non-invasive technologies for improving early diagnosis of colorectal cancer (CRC) are demanded by clinicians. Optical Coherence Tomography (OCT) provides sub-surface structural information and offers diagnosis capabilities of colon polyps, further improved by machine learning methods. Databases of OCT images are necessary to facilitate algorithms development and testing. MATERIALS AND METHODS A database has been acquired from rat colonic samples with a Thorlabs OCT system with 930nm centre wavelength that provides 1.2KHz A-scan rate, 7μm axial resolution in air, 4μm lateral resolution, 1.7mm imaging depth in air, 6mm x 6mm FOV, and 107dB sensitivity. The colon from anaesthetised animals has been excised and samples have been extracted and preserved for ex-vivo analysis with the OCT equipment. RESULTS This database consists of OCT 3D volumes (C-scans) and 2D images (B-scans) of murine samples from: 1) healthy tissue, for ground-truth comparison (18 samples; 66 C-scans; 17,478 B-scans); 2) hyperplastic polyps, obtained from an induced colorectal hyperplastic murine model (47 samples; 153 C-scans; 42,450 B-scans); 3) neoplastic polyps (adenomatous and adenocarcinomatous), obtained from clinically validated Pirc F344/NTac-Apcam1137 rat model (232 samples; 564 C-scans; 158,557 B-scans); and 4) unknown tissue (polyp adjacent, presumably healthy) (98 samples; 157 C-scans; 42,070 B-scans). CONCLUSIONS A novel extensive ex-vivo OCT database of murine CRC model has been obtained and will be openly published for the research community. It can be used for classification/segmentation machine learning methods, for correlation between OCT features and histopathological structures, and for developing new non-invasive in-situ methods of diagnosis of colorectal cancer.

Download Full-text

A billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction

10.1101/2021.07.06.451258 ◽

2021 ◽

Author(s):

Philippe Auguste Robert ◽

Rahmad Akbar ◽

Robert Frank ◽

Milena Pavlović ◽

Michael Widrich ◽

...

Keyword(s):

Machine Learning ◽

In Silico ◽

Prediction Accuracy ◽

Large Scale ◽

Structural Information ◽

Antigen Binding ◽

Antibody Specificity ◽

Binding Prediction ◽

Information Encoding ◽

Prediction Problems

Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design.

Download Full-text

Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery

PeerJ ◽

10.7717/peerj.10381 ◽

2020 ◽

Vol 8 ◽

pp. e10381

Author(s):

Rohit Nandakumar ◽

Valentin Dinu

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Drug Discovery ◽

Structural Information ◽

Learning Model ◽

Protein Protein Interaction ◽

Drug Molecules ◽

Machine Learning Model ◽

Disease Associations ◽

History Of

Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.

Download Full-text

Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning

10.1101/682880 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ge Liu ◽

Haoyang Zeng ◽

Jonas Mueller ◽

Brandon Carter ◽

Ziheng Wang ◽

...

Keyword(s):

Machine Learning ◽

Structural Information ◽

High Capacity ◽

Training Data ◽

Proper Function ◽

Integrative Approach ◽

Machine Learning Method ◽

Learning Method ◽

Target Specificity

AbstractThe precise targeting of antibodies and other protein therapeutics is required for their proper function and the elimination of deleterious off-target effects. Often the molecular structure of a therapeutic target is unknown and randomized methods are used to design antibodies without a model that relates antibody sequence to desired properties. Here we present a machine learning method that can design human Immunoglobulin G (IgG) antibodies with target affinities that are superior to candidates from phage display panning experiments within a limited design budget. We also demonstrate that machine learning can improve target-specificity by the modular composition of models from different experimental campaigns, enabling a new integrative approach to improving target specificity. Our results suggest a new path for the discovery of therapeutic molecules by demonstrating that predictive and differentiable models of antibody binding can be learned from high-throughput experimental data without the need for target structural data.SignificanceAntibody based therapeutics must meet both affinity and specificity metrics, and existing in vitro methods for meeting these metrics are based upon randomization and empirical testing. We demonstrate that with sufficient target-specific training data machine learning can suggest novel antibody variable domain sequences that are superior to those observed during training. Our machine learning method does not require any target structural information. We further show that data from disparate antibody campaigns can be combined by machine learning to improve antibody specificity.

Download Full-text

Detecting Controversial Articles on Citizen Journalism

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v11i1.478 ◽

2018 ◽

Vol 11 (1) ◽

pp. 34

Author(s):

Alfan Farizki Wicaksono ◽

Sharon Raissa Herdiyana ◽

Mirna Adriani

Keyword(s):

Machine Learning ◽

Structural Information ◽

Structural Features ◽

The Body ◽

Supervised Machine Learning ◽

Citizen Journalism ◽

Learning Approach ◽

Daily News ◽

Machine Learning Approach ◽

Controversial Topic

Someone's understanding and stance on a particular controversial topic can be influenced by daily news or articles he consume everyday. Unfortunately, readers usually do not realize that they are reading controversial articles. In this paper, we address the problem of automatically detecting controversial article from citizen journalism media. To solve the problem, we employ a supervised machine learning approach with several hand-crafted features that exploits linguistic information, meta-data of an article, structural information in the commentary section, and sentiment expressed inside the body of an article. The experimental results shows that our proposed method manages to perform the addressed task effectively. The best performance so far is achieved when we use all proposed feature with Logistic Regression as our model (82.89\% in terms of accuracy). Moreover, we found that information from commentary section (structural features) contributes most to the classification task.

Download Full-text

Fault-Guided Seismic Stratigraphy Interpretation via Semi-Supervised Learning

10.2118/207218-ms ◽

2021 ◽

Author(s):

Haibin Di ◽

Chakib Kada Kloucha ◽

Cen Li ◽

Aria Abubakar ◽

Zhun Li ◽

...

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Model Building ◽

Structural Information ◽

Mapping Function ◽

Seismic Stratigraphy ◽

Training Data ◽

Entire Study ◽

Depositional Process ◽

Convolutional Autoencoder

Abstract Delineating seismic stratigraphic features and depositional facies is of importance to successful reservoir mapping and identification in the subsurface. Robust seismic stratigraphy interpretation is confronted with two major challenges. The first one is to maximally automate the process particularly with the increasing size of seismic data and complexity of target stratigraphies, while the second challenge is to efficiently incorporate available structures into stratigraphy model building. Machine learning, particularly convolutional neural network (CNN), has been introduced into assisting seismic stratigraphy interpretation through supervised learning. However, the small amount of available expert labels greatly restricts the performance of such supervised CNN. Moreover, most of the exiting CNN implementations are based on only amplitude, which fails to use necessary structural information such as faults for constraining the machine learning. To resolve both challenges, this paper presents a semi-supervised learning workflow for fault-guided seismic stratigraphy interpretation, which consists of two components. The first component is seismic feature engineering (SFE), which aims at learning the provided seismic and fault data through a unsupervised convolutional autoencoder (CAE), while the second one is stratigraphy model building (SMB), which aims at building an optimal mapping function between the features extracted from the SFE CAE and the target stratigraphic labels provided by an experienced interpreter through a supervised CNN. Both components are connected by embedding the encoder of the SFE CAE into the SMB CNN, which forces the SMB learning based on these features commonly existing in the entire study area instead of those only at the limited training data; correspondingly, the risk of overfitting is greatly eliminated. More innovatively, the fault constraint is introduced by customizing the SMB CNN of two output branches, with one to match the target stratigraphies and the other to reconstruct the input fault, so that the fault continues contributing to the process of SMB learning. The performance of such fault-guided seismic stratigraphy interpretation is validated by an application to a real seismic dataset, and the machine prediction not only matches the manual interpretation accurately but also clearly illustrates the depositional process in the study area.

Download Full-text

AI-based Spectroscopic Monitoring of Real-time Interactions between SARS-CoV-2 and Human ACE2

10.26434/chemrxiv.12671618.v1 ◽

2020 ◽

Author(s):

Sheng Ye ◽

Guozhen Zhang ◽

Jun Jiang

Keyword(s):

Machine Learning ◽

Real Time ◽

Protein Dynamics ◽

Structural Information ◽

Protein Complexes ◽

Spike Protein ◽

Proof Of Concept ◽

Spectroscopy Study ◽

Time Resolved ◽

Spectroscopic Monitoring

<div> <p>Here we demonstrate by a proof-of-concept simulation of IR spectra of complex of spike protein of SARS-CoV-2 and human ACE2, that a time-resolved spectroscopy may monitor the real-time structural information of the protein-protein complexes of interest, with the help of a machine learning protocol. The significant speedup of our approach relative to conventional quantum chemistry approach suggests a promising way of accelerating the development of real-time spectroscopy study of protein dynamics.</p> </div>

Download Full-text

Information modeling and knowledge extraction for machine learning applications in industrial production systems

Machine Learning for Cyber Physical Systems - Technologien für die intelligente Automation ◽

10.1007/978-3-662-62746-4_8 ◽

2020 ◽

pp. 73-81

Author(s):

Stefan Windmann ◽

Christian Kühnert

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Structural Information ◽

Semantic Annotation ◽

Information Model ◽

Knowledge Extraction ◽

Information Modeling ◽

Sensor Data ◽

Tool Chain ◽

Machine Learning Applications

AbstractIn this paper, a new information model for machine learning applications is introduced, which allows for a consistent acquisition and semantic annotation of process data, structural information and domain knowledge from industrial productions systems. The proposed information model is based on Industry 4.0 components and IEC 61360 component descriptions. To model sensor data, components of the OGC SensorThings model such as data streams and observations have been incorporated in this approach. Machine learning models can be integrated into the information model in terms of existing model serving frameworks like PMML or Tensorflowgraph. Based on the proposed information model, a tool chain for automatic knowledge extraction is introduced and the automatic classification of unstructured text is investigated as a particular application case for the proposed tool chain.

Download Full-text

AI-based Spectroscopic Monitoring of Real-time Interactions between SARS-CoV-2 and Human ACE2

10.26434/chemrxiv.12671618 ◽

2020 ◽

Author(s):

Sheng Ye ◽

Guozhen Zhang ◽

Jun Jiang

Keyword(s):

Machine Learning ◽

Real Time ◽

Protein Dynamics ◽

Structural Information ◽

Protein Complexes ◽

Spike Protein ◽

Proof Of Concept ◽

Spectroscopy Study ◽

Time Resolved ◽

Spectroscopic Monitoring

<div> <p>Here we demonstrate by a proof-of-concept simulation of IR spectra of complex of spike protein of SARS-CoV-2 and human ACE2, that a time-resolved spectroscopy may monitor the real-time structural information of the protein-protein complexes of interest, with the help of a machine learning protocol. The significant speedup of our approach relative to conventional quantum chemistry approach suggests a promising way of accelerating the development of real-time spectroscopy study of protein dynamics.</p> </div>

Download Full-text

The structural information filtered features (SIFF) potential: Maximizing information stored in machine-learning descriptors for materials prediction

Journal of Applied Physics ◽

10.1063/5.0002252 ◽

2020 ◽

Vol 127 (21) ◽

pp. 215108

Author(s):

Jorge Arturo Hernandez Zeledon ◽

Aldo H. Romero ◽

Pengju Ren ◽

Xiaodong Wen ◽

Yongwang Li ◽

...

Keyword(s):

Machine Learning ◽

Structural Information

Download Full-text