scholarly journals Learning from the 2018 Western Japan Heavy Rains to Detect Floods during the 2019 Hagibis Typhoon

2020 ◽  
Vol 12 (14) ◽  
pp. 2244
Author(s):  
Luis Moya ◽  
Erick Mas ◽  
Shunichi Koshimura

Applications of machine learning on remote sensing data appear to be endless. Its use in damage identification for early response in the aftermath of a large-scale disaster has a specific issue. The collection of training data right after a disaster is costly, time-consuming, and many times impossible. This study analyzes a possible solution to the referred issue: the collection of training data from past disaster events to calibrate a discriminant function. Then the identification of affected areas in a current disaster can be performed in near real-time. The performance of a supervised machine learning classifier to learn from training data collected from the 2018 heavy rainfall at Okayama Prefecture, Japan, and to identify floods due to the typhoon Hagibis on 12 October 2019 at eastern Japan is reported in this paper. The results show a moderate agreement with flood maps provided by local governments and public institutions, and support the assumption that previous disaster information can be used to identify a current disaster in near-real time.

2021 ◽  
Author(s):  
Aurore Lafond ◽  
Maurice Ringer ◽  
Florian Le Blay ◽  
Jiaxu Liu ◽  
Ekaterina Millan ◽  
...  

Abstract Abnormal surface pressure is typically the first indicator of a number of problematic events, including kicks, losses, washouts and stuck pipe. These events account for 60–70% of all drilling-related nonproductive time, so their early and accurate detection has the potential to save the industry billions of dollars. Detecting these events today requires an expert user watching multiple curves, which can be costly, and subject to human errors. The solution presented in this paper is aiming at augmenting traditional models with new machine learning techniques, which enable to detect these events automatically and help the monitoring of the drilling well. Today’s real-time monitoring systems employ complex physical models to estimate surface standpipe pressure while drilling. These require many inputs and are difficult to calibrate. Machine learning is an alternative method to predict pump pressure, but this alone needs significant labelled training data, which is often lacking in the drilling world. The new system combines these approaches: a machine learning framework is used to enable automated learning while the physical models work to compensate any gaps in the training data. The system uses only standard surface measurements, is fully automated, and is continuously retrained while drilling to ensure the most accurate pressure prediction. In addition, a stochastic (Bayesian) machine learning technique is used, which enables not only a prediction of the pressure, but also the uncertainty and confidence of this prediction. Last, the new system includes a data quality control workflow. It discards periods of low data quality for the pressure anomaly detection and enables to have a smarter real-time events analysis. The new system has been tested on historical wells using a new test and validation framework. The framework runs the system automatically on large volumes of both historical and simulated data, to enable cross-referencing the results with observations. In this paper, we show the results of the automated test framework as well as the capabilities of the new system in two specific case studies, one on land and another offshore. Moreover, large scale statistics enlighten the reliability and the efficiency of this new detection workflow. The new system builds on the trend in our industry to better capture and utilize digital data for optimizing drilling.


2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


2021 ◽  
Author(s):  
Arturo Magana-Mora ◽  
Mohammad AlJubran ◽  
Jothibasu Ramasamy ◽  
Mohammed AlBassam ◽  
Chinthaka Gooneratne ◽  
...  

Abstract Objective/Scope. Lost circulation events (LCEs) are among the top causes for drilling nonproductive time (NPT). The presence of natural fractures and vugular formations causes loss of drilling fluid circulation. Drilling depleted zones with incorrect mud weights can also lead to drilling induced losses. LCEs can also develop into additional drilling hazards, such as stuck pipe incidents, kicks, and blowouts. An LCE is traditionally diagnosed only when there is a reduction in mud volume in mud pits in the case of moderate losses or reduction of mud column in the annulus in total losses. Using machine learning (ML) for predicting the presence of a loss zone and the estimation of fracture parameters ahead is very beneficial as it can immediately alert the drilling crew in order for them to take the required actions to mitigate or cure LCEs. Methods, Procedures, Process. Although different computational methods have been proposed for the prediction of LCEs, there is a need to further improve the models and reduce the number of false alarms. Robust and generalizable ML models require a sufficiently large amount of data that captures the different parameters and scenarios representing an LCE. For this, we derived a framework that automatically searches through historical data, locates LCEs, and extracts the surface drilling and rheology parameters surrounding such events. Results, Observations, and Conclusions. We derived different ML models utilizing various algorithms and evaluated them using the data-split technique at the level of wells to find the most suitable model for the prediction of an LCE. From the model comparison, random forest classifier achieved the best results and successfully predicted LCEs before they occurred. The developed LCE model is designed to be implemented in the real-time drilling portal as an aid to the drilling engineers and the rig crew to minimize or avoid NPT. Novel/Additive Information. The main contribution of this study is the analysis of real-time surface drilling parameters and sensor data to predict an LCE from a statistically representative number of wells. The large-scale analysis of several wells that appropriately describe the different conditions before an LCE is critical for avoiding model undertraining or lack of model generalization. Finally, we formulated the prediction of LCEs as a time-series problem and considered parameter trends to accurately determine the early signs of LCEs.


2019 ◽  
Author(s):  
Mojtaba Haghighatlari ◽  
Gaurav Vishwakarma ◽  
Mohammad Atif Faiz Afzal ◽  
Johannes Hachmann

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 800 ◽  
Author(s):  
Irshad Khan ◽  
Seonhwa Choi ◽  
Young-Woo Kwon

Detecting earthquakes using smartphones or IoT devices in real-time is an arduous and challenging task, not only because it is constrained with the hard real-time issue but also due to the similarity of earthquake signals and the non-earthquake signals (i.e., noise or other activities). Moreover, the variety of human activities also makes it more difficult when a smartphone is used as an earthquake detecting sensor. To that end, in this article, we leverage a machine learning technique with earthquake features rather than traditional seismic methods. First, we split the detection task into two categories including static environment and dynamic environment. Then, we experimentally evaluate different features and propose the most appropriate machine learning model and features for the static environment to tackle the issue of noisy components and detect earthquakes in real-time with less false alarm rates. The experimental result of the proposed model shows promising results not only on the given dataset but also on the unseen data pointing to the generalization characteristics of the model. Finally, we demonstrate that the proposed model can be also used in the dynamic environment if it is trained with different dataset.


Sign in / Sign up

Export Citation Format

Share Document