scholarly journals Machine Learning Predictors of Extreme Events Occurring in Complex Dynamical Systems

Entropy ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. 925 ◽  
Author(s):  
Stephen Guth ◽  
Themistoklis P. Sapsis

The ability to characterize and predict extreme events is a vital topic in fields ranging from finance to ocean engineering. Typically, the most-extreme events are also the most-rare, and it is this property that makes data collection and direct simulation challenging. We consider the problem of deriving optimal predictors of extremes directly from data characterizing a complex system, by formulating the problem in the context of binary classification. Specifically, we assume that a training dataset consists of: (i) indicator time series specifying on whether or not an extreme event occurs; and (ii) observables time series, which are employed to formulate efficient predictors. We employ and assess standard binary classification criteria for the selection of optimal predictors, such as total and balanced error and area under the curve, in the context of extreme event prediction. For physical systems for which there is sufficient separation between the extreme and regular events, i.e., extremes are distinguishably larger compared with regular events, we prove the existence of optimal extreme event thresholds that lead to efficient predictors. Moreover, motivated by the special character of extreme events, i.e., the very low rate of occurrence, we formulate a new objective function for the selection of predictors. This objective is constructed from the same principles as receiver operating characteristic curves, and exhibits a geometric connection to the regime separation property. We demonstrate the application of the new selection criterion to the advance prediction of intermittent extreme events in two challenging complex systems: the Majda–McLaughlin–Tabak model, a 1D nonlinear, dispersive wave model, and the 2D Kolmogorov flow model, which exhibits extreme dissipation events.

2020 ◽  
Vol 142 (4) ◽  
Author(s):  
Yuxiang Ma ◽  
Changfu Yuan ◽  
Congfang Ai ◽  
Guohai Dong

Abstract The generation of two freak waves in a broadband and a narrowband random series registered in the experiments of Li, J. X., Li, P. F., and Liu, S. X. (2013, “Observations of Freak Waves in Random Wave Field in 2D Experimental Wave Flume,” China Ocean Eng., 27(5), pp. 659–670) is precisely reconstructed using a fully non-hydrostatic water wave model. The simulation results indicate that even when the background spectral bandwidths are different, the evolution processes of the two freak waves are similar. Both freak waves emerge quickly during the transition from normal states to extreme events. The freak waves can persist over a long distance, i.e., approximately 5 peak wavelengths. The reconstructed time series in both the backward and forward locations at which the freak waves were recorded reveal that the largest freak wave crests were not captured in the experiment. The freak waves gradually emerged from an intense wave group. The waves developed quickly during the transition from a normal state to an extreme event. Very deep troughs were also formed in the evolution process. The two freak waves were actually generated via different spectral bandwidth processes, but the generation mechanisms of the rogue waves were similar. By analyzing the time series of the freak wave groups, the formation of the freak waves is found to result from the combined effect of the dispersive focusing, the third-order resonant wave interactions, and the higher harmonics.


2021 ◽  
Vol 7 (12) ◽  
pp. eabd4177
Author(s):  
Carole H. Sudre ◽  
Karla A. Lee ◽  
Mary Ni Lochlainn ◽  
Thomas Varsavsky ◽  
Benjamin Murray ◽  
...  

As no one symptom can predict disease severity or the need for dedicated medical support in coronavirus disease 2019 (COVID-19), we asked whether documenting symptom time series over the first few days informs outcome. Unsupervised time series clustering over symptom presentation was performed on data collected from a training dataset of completed cases enlisted early from the COVID Symptom Study Smartphone application, yielding six distinct symptom presentations. Clustering was validated on an independent replication dataset between 1 and 28 May 2020. Using the first 5 days of symptom logging, the ROC-AUC (receiver operating characteristic – area under the curve) of need for respiratory support was 78.8%, substantially outperforming personal characteristics alone (ROC-AUC 69.5%). Such an approach could be used to monitor at-risk patients and predict medical resource requirements days before they are required.


2004 ◽  
Vol 155 (5) ◽  
pp. 142-145 ◽  
Author(s):  
Claudio Defila

The record-breaking heatwave of 2003 also had an impact on the vegetation in Switzerland. To examine its influences seven phenological late spring and summer phases were evaluated together with six phases in the autumn from a selection of stations. 30% of the 122 chosen phenological time series in late spring and summer phases set a new record (earliest arrival). The proportion of very early arrivals is very high and the mean deviation from the norm is between 10 and 20 days. The situation was less extreme in autumn, where 20% of the 103 time series chosen set a new record. The majority of the phenological arrivals were found in the class «normal» but the class«very early» is still well represented. The mean precocity lies between five and twenty days. As far as the leaf shedding of the beech is concerned, there was even a slight delay of around six days. The evaluation serves to show that the heatwave of 2003 strongly influenced the phenological events of summer and spring.


2021 ◽  
Author(s):  
Ali Abdolali ◽  
Andre van der Westhuysen ◽  
Zaizhong Ma ◽  
Avichal Mehra ◽  
Aron Roland ◽  
...  

AbstractVarious uncertainties exist in a hindcast due to the inabilities of numerical models to resolve all the complicated atmosphere-sea interactions, and the lack of certain ground truth observations. Here, a comprehensive analysis of an atmospheric model performance in hindcast mode (Hurricane Weather and Research Forecasting model—HWRF) and its 40 ensembles during severe events is conducted, evaluating the model accuracy and uncertainty for hurricane track parameters, and wind speed collected along satellite altimeter tracks and at stationary source point observations. Subsequently, the downstream spectral wave model WAVEWATCH III is forced by two sets of wind field data, each includes 40 members. The first ones are randomly extracted from original HWRF simulations and the second ones are based on spread of best track parameters. The atmospheric model spread and wave model error along satellite altimeters tracks and at stationary source point observations are estimated. The study on Hurricane Irma reveals that wind and wave observations during this extreme event are within ensemble spreads. While both Models have wide spreads over areas with landmass, maximum uncertainty in the atmospheric model is at hurricane eye in contrast to the wave model.


2021 ◽  
pp. 016555152199804
Author(s):  
Qian Geng ◽  
Ziang Chuai ◽  
Jian Jin

To provide junior researchers with domain-specific concepts efficiently, an automatic approach for academic profiling is needed. First, to obtain personal records of a given scholar, typical supervised approaches often utilise structured data like infobox in Wikipedia as training dataset, but it may lead to a severe mis-labelling problem when they are utilised to train a model directly. To address this problem, a new relation embedding method is proposed for fine-grained entity typing, in which the initial vector of entities and a new penalty scheme are considered, based on the semantic distance of entities and relations. Also, to highlight critical concepts relevant to renowned scholars, scholars’ selective bibliographies which contain massive academic terms are analysed by a newly proposed extraction method based on logistic regression, AdaBoost algorithm and learning-to-rank techniques. It bridges the gap that conventional supervised methods only return binary classification results and fail to help researchers understand the relative importance of selected concepts. Categories of experiments on academic profiling and corresponding benchmark datasets demonstrate that proposed approaches outperform existing methods notably. The proposed techniques provide an automatic way for junior researchers to obtain organised knowledge in a specific domain, including scholars’ background information and domain-specific concepts.


2021 ◽  
Vol 502 (2) ◽  
pp. 2513-2517
Author(s):  
Stavros Akras ◽  
Denise R Gonçalves ◽  
Alvaro Alvarez-Candal ◽  
Claudio B Pereira

ABSTRACT We report the validation of a recently proposed infrared (IR) selection criterion for symbiotic stars (SySts). Spectroscopic data were obtained for seven candidates, selected from the SySt candidates of Akras et al. by employing the new supplementary IR selection criterion for SySts in the VST/OmegaCAM Photometric H-Alpha Survey. Five of them turned out to be genuine SySts after the detection of H α, He ii, and [O iii] emission lines as well as TiO molecular bands. The characteristic O vi Raman-scattered line is also detected in one of these SySts. According to their IR colours and optical spectra, all five newly discovered SySts are classified as S-type. The high rate of true SySts detections of this work demonstrates that the combination of the H α emission and the new IR criterion improves the selection of target lists for follow-up observations by minimizing the number of contaminants and optimizing the observing time.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Alain Hecq ◽  
Li Sun

AbstractWe propose a model selection criterion to detect purely causal from purely noncausal models in the framework of quantile autoregressions (QAR). We also present asymptotics for the i.i.d. case with regularly varying distributed innovations in QAR. This new modelling perspective is appealing for investigating the presence of bubbles in economic and financial time series, and is an alternative to approximate maximum likelihood methods. We illustrate our analysis using hyperinflation episodes of Latin American countries.


2021 ◽  
Vol 11 (9) ◽  
pp. 3836
Author(s):  
Valeri Gitis ◽  
Alexander Derendyaev ◽  
Konstantin Petrov ◽  
Eugene Yurkov ◽  
Sergey Pirogov ◽  
...  

Prostate cancer is the second most frequent malignancy (after lung cancer). Preoperative staging of PCa is the basis for the selection of adequate treatment tactics. In particular, an urgent problem is the classification of indolent and aggressive forms of PCa in patients with the initial stages of the tumor process. To solve this problem, we propose to use a new binary classification machine-learning method. The proposed method of monotonic functions uses a model in which the disease’s form is determined by the severity of the patient’s condition. It is assumed that the patient’s condition is the easier, the less the deviation of the indicators from the normal values inherent in healthy people. This assumption means that the severity (form) of the disease can be represented by monotonic functions from the values of the deviation of the patient’s indicators beyond the normal range. The method is used to solve the problem of classifying patients with indolent and aggressive forms of prostate cancer according to pretreatment data. The learning algorithm is nonparametric. At the same time, it allows an explanation of the classification results in the form of a logical function. To do this, you should indicate to the algorithm either the threshold value of the probability of successful classification of patients with an indolent form of PCa, or the threshold value of the probability of misclassification of patients with an aggressive form of PCa disease. The examples of logical rules given in the article show that they are quite simple and can be easily interpreted in terms of preoperative indicators of the form of the disease.


Water ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 859
Author(s):  
Giorgio Bellotti ◽  
Leopoldo Franco ◽  
Claudia Cecioni

Hindcasted wind and wave data, available on a coarse resolution global grid (Copernicus ERA5 dataset), are downscaled by means of the numerical model SWAN (simulating waves in the nearshore) to produce time series of wave conditions at a high resolution along the Italian coasts in the central Tyrrhenian Sea. In order to achieve the proper spatial resolution along the coast, the finite element version of the model is used. Wave data time series at the ERA5 grid are used to specify boundary conditions for the wave model at the offshore sides of the computational domain. The wind field is fed to the model to account for local wave generation. The modeled sea states are compared against the multiple wave records available in the area, in order to calibrate and validate the model. The model results are in quite good agreement with direct measurements, both in terms of wave climate and wave extremes. The results show that using the present modeling chain, it is possible to build a reliable nearshore wave parameters database with high space resolution. Such a database, once prepared for coastal areas, possibly at the national level, can be of high value for many engineering activities related to coastal area management, and can be useful to provide fundamental information for the development of operational coastal services.


Water ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 2058 ◽  
Author(s):  
Larissa Rolim ◽  
Francisco de Souza Filho

Improved water resource management relies on accurate analyses of the past dynamics of hydrological variables. The presence of low-frequency structures in hydrologic time series is an important feature. It can modify the probability of extreme events occurring in different time scales, which makes the risk associated with extreme events dynamic, changing from one decade to another. This article proposes a methodology capable of dynamically detecting and predicting low-frequency streamflow (16–32 years), which presented significance in the wavelet power spectrum. The Standardized Runoff Index (SRI), the Pruned Exact Linear Time (PELT) algorithm, the breaks for additive seasonal and trend (BFAST) method, and the hidden Markov model (HMM) were used to identify the shifts in low frequency. The HMM was also used to forecast the low frequency. As part of the results, the regime shifts detected by the BFAST approach are not entirely consistent with results from the other methods. A common shift occurs in the mid-1980s and can be attributed to the construction of the reservoir. Climate variability modulates the streamflow low-frequency variability, and anthropogenic activities and climate change can modify this modulation. The identification of shifts reveals the impact of low frequency in the streamflow time series, showing that the low-frequency variability conditions the flows of a given year.


Sign in / Sign up

Export Citation Format

Share Document