Machine Learning Predictors of Extreme Events Occurring in Complex Dynamical Systems

Stephen Guth; Themistoklis P. Sapsis

doi:10.3390/e21100925

Machine Learning Predictors of Extreme Events Occurring in Complex Dynamical Systems

Entropy ◽

10.3390/e21100925 ◽

2019 ◽

Vol 21 (10) ◽

pp. 925 ◽

Cited By ~ 3

Author(s):

Stephen Guth ◽

Themistoklis P. Sapsis

Keyword(s):

Time Series ◽

Extreme Events ◽

Extreme Event ◽

Binary Classification ◽

Selection Criterion ◽

Area Under The Curve ◽

Wave Model ◽

Training Dataset ◽

Ocean Engineering ◽

Selection Of

The ability to characterize and predict extreme events is a vital topic in fields ranging from finance to ocean engineering. Typically, the most-extreme events are also the most-rare, and it is this property that makes data collection and direct simulation challenging. We consider the problem of deriving optimal predictors of extremes directly from data characterizing a complex system, by formulating the problem in the context of binary classification. Specifically, we assume that a training dataset consists of: (i) indicator time series specifying on whether or not an extreme event occurs; and (ii) observables time series, which are employed to formulate efficient predictors. We employ and assess standard binary classification criteria for the selection of optimal predictors, such as total and balanced error and area under the curve, in the context of extreme event prediction. For physical systems for which there is sufficient separation between the extreme and regular events, i.e., extremes are distinguishably larger compared with regular events, we prove the existence of optimal extreme event thresholds that lead to efficient predictors. Moreover, motivated by the special character of extreme events, i.e., the very low rate of occurrence, we formulate a new objective function for the selection of predictors. This objective is constructed from the same principles as receiver operating characteristic curves, and exhibits a geometric connection to the regime separation property. We demonstrate the application of the new selection criterion to the advance prediction of intermittent extreme events in two challenging complex systems: the Majda–McLaughlin–Tabak model, a 1D nonlinear, dispersive wave model, and the 2D Kolmogorov flow model, which exhibits extreme dissipation events.

Download Full-text

Reconstruction and Analysis of Freak Waves Generated From Unidirectional Random Waves

Journal of Offshore Mechanics and Arctic Engineering ◽

10.1115/1.4045766 ◽

2020 ◽

Vol 142 (4) ◽

Author(s):

Yuxiang Ma ◽

Changfu Yuan ◽

Congfang Ai ◽

Guohai Dong

Keyword(s):

Time Series ◽

Extreme Event ◽

Wave Model ◽

Rogue Waves ◽

Random Wave ◽

Random Waves ◽

Long Distance ◽

Freak Waves ◽

Freak Wave ◽

Wave Flume

Abstract The generation of two freak waves in a broadband and a narrowband random series registered in the experiments of Li, J. X., Li, P. F., and Liu, S. X. (2013, “Observations of Freak Waves in Random Wave Field in 2D Experimental Wave Flume,” China Ocean Eng., 27(5), pp. 659–670) is precisely reconstructed using a fully non-hydrostatic water wave model. The simulation results indicate that even when the background spectral bandwidths are different, the evolution processes of the two freak waves are similar. Both freak waves emerge quickly during the transition from normal states to extreme events. The freak waves can persist over a long distance, i.e., approximately 5 peak wavelengths. The reconstructed time series in both the backward and forward locations at which the freak waves were recorded reveal that the largest freak wave crests were not captured in the experiment. The freak waves gradually emerged from an intense wave group. The waves developed quickly during the transition from a normal state to an extreme event. Very deep troughs were also formed in the evolution process. The two freak waves were actually generated via different spectral bandwidth processes, but the generation mechanisms of the rogue waves were similar. By analyzing the time series of the freak wave groups, the formation of the freak waves is found to result from the combined effect of the dispersive focusing, the third-order resonant wave interactions, and the higher harmonics.

Download Full-text

Symptom clusters in COVID-19: A potential clinical prediction tool from the COVID Symptom Study app

Science Advances ◽

10.1126/sciadv.abd4177 ◽

2021 ◽

Vol 7 (12) ◽

pp. eabd4177

Author(s):

Carole H. Sudre ◽

Karla A. Lee ◽

Mary Ni Lochlainn ◽

Thomas Varsavsky ◽

Benjamin Murray ◽

...

Keyword(s):

Time Series ◽

Area Under The Curve ◽

Personal Characteristics ◽

Smartphone Application ◽

Training Dataset ◽

Symptom Presentation ◽

Resource Requirements ◽

Risk Patients ◽

Independent Replication ◽

Characteristic Area

As no one symptom can predict disease severity or the need for dedicated medical support in coronavirus disease 2019 (COVID-19), we asked whether documenting symptom time series over the first few days informs outcome. Unsupervised time series clustering over symptom presentation was performed on data collected from a training dataset of completed cases enlisted early from the COVID Symptom Study Smartphone application, yielding six distinct symptom presentations. Clustering was validated on an independent replication dataset between 1 and 28 May 2020. Using the first 5 days of symptom logging, the ROC-AUC (receiver operating characteristic – area under the curve) of need for respiratory support was 78.8%, substantially outperforming personal characteristics alone (ROC-AUC 69.5%). Such an approach could be used to monitor at-risk patients and predict medical resource requirements days before they are required.

Download Full-text

Der Sommer und Herbst 2003 aus phänologischer Sicht | Summer and autumn 2003 from a phenological point of view

Schweizerische Zeitschrift fur Forstwesen ◽

10.3188/szf.2004.0142 ◽

2004 ◽

Vol 155 (5) ◽

pp. 142-145 ◽

Cited By ~ 2

Author(s):

Claudio Defila

Keyword(s):

Time Series ◽

New Record ◽

Point Of View ◽

Late Spring ◽

Mean Deviation ◽

Leaf Shedding ◽

The Mean ◽

Very High ◽

Selection Of

The record-breaking heatwave of 2003 also had an impact on the vegetation in Switzerland. To examine its influences seven phenological late spring and summer phases were evaluated together with six phases in the autumn from a selection of stations. 30% of the 122 chosen phenological time series in late spring and summer phases set a new record (earliest arrival). The proportion of very early arrivals is very high and the mean deviation from the norm is between 10 and 20 days. The situation was less extreme in autumn, where 20% of the 103 time series chosen set a new record. The majority of the phenological arrivals were found in the class «normal» but the class«very early» is still well represented. The mean precocity lies between five and twenty days. As far as the leaf shedding of the beech is concerned, there was even a slight delay of around six days. The evaluation serves to show that the heatwave of 2003 strongly influenced the phenological events of summer and spring.

Download Full-text

Evaluating the accuracy and uncertainty of atmospheric and wave model hindcasts during severe events using model ensembles

Ocean Dynamics ◽

10.1007/s10236-020-01426-9 ◽

2021 ◽

Author(s):

Ali Abdolali ◽

Andre van der Westhuysen ◽

Zaizhong Ma ◽

Avichal Mehra ◽

Aron Roland ◽

...

Keyword(s):

Extreme Event ◽

Numerical Models ◽

Model Performance ◽

Ground Truth ◽

Wave Model ◽

Atmospheric Model ◽

Stationary Source ◽

Source Point ◽

Spectral Wave Model ◽

Model Ensembles

AbstractVarious uncertainties exist in a hindcast due to the inabilities of numerical models to resolve all the complicated atmosphere-sea interactions, and the lack of certain ground truth observations. Here, a comprehensive analysis of an atmospheric model performance in hindcast mode (Hurricane Weather and Research Forecasting model—HWRF) and its 40 ensembles during severe events is conducted, evaluating the model accuracy and uncertainty for hurricane track parameters, and wind speed collected along satellite altimeter tracks and at stationary source point observations. Subsequently, the downstream spectral wave model WAVEWATCH III is forced by two sets of wind field data, each includes 40 members. The first ones are randomly extracted from original HWRF simulations and the second ones are based on spread of best track parameters. The atmospheric model spread and wave model error along satellite altimeters tracks and at stationary source point observations are estimated. The study on Hurricane Irma reveals that wind and wave observations during this extreme event are within ensemble spreads. While both Models have wide spreads over areas with landmass, maximum uncertainty in the atmospheric model is at hurricane eye in contrast to the wave model.

Download Full-text

Automatic construction of academic profile: A case of information science domain

Journal of Information Science ◽

10.1177/0165551521998048 ◽

2021 ◽

pp. 016555152199804

Author(s):

Qian Geng ◽

Ziang Chuai ◽

Jian Jin

Keyword(s):

Information Science ◽

Binary Classification ◽

Learning To Rank ◽

Semantic Distance ◽

Background Information ◽

Training Dataset ◽

Initial Vector ◽

Specific Domain ◽

Adaboost Algorithm ◽

Domain Specific

To provide junior researchers with domain-specific concepts efficiently, an automatic approach for academic profiling is needed. First, to obtain personal records of a given scholar, typical supervised approaches often utilise structured data like infobox in Wikipedia as training dataset, but it may lead to a severe mis-labelling problem when they are utilised to train a model directly. To address this problem, a new relation embedding method is proposed for fine-grained entity typing, in which the initial vector of entities and a new penalty scheme are considered, based on the semantic distance of entities and relations. Also, to highlight critical concepts relevant to renowned scholars, scholars’ selective bibliographies which contain massive academic terms are analysed by a newly proposed extraction method based on logistic regression, AdaBoost algorithm and learning-to-rank techniques. It bridges the gap that conventional supervised methods only return binary classification results and fail to help researchers understand the relative importance of selected concepts. Categories of experiments on academic profiling and corresponding benchmark datasets demonstrate that proposed approaches outperform existing methods notably. The proposed techniques provide an automatic way for junior researchers to obtain organised knowledge in a specific domain, including scholars’ background information and domain-specific concepts.

Download Full-text

Discovery of five new Galactic symbiotic stars in the VPHAS+ survey

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stab195 ◽

2021 ◽

Vol 502 (2) ◽

pp. 2513-2517

Author(s):

Stavros Akras ◽

Denise R Gonçalves ◽

Alvaro Alvarez-Candal ◽

Claudio B Pereira

Keyword(s):

Spectroscopic Data ◽

Selection Criterion ◽

Optical Spectra ◽

Emission Lines ◽

High Rate ◽

Symbiotic Stars ◽

Selection Of

ABSTRACT We report the validation of a recently proposed infrared (IR) selection criterion for symbiotic stars (SySts). Spectroscopic data were obtained for seven candidates, selected from the SySt candidates of Akras et al. by employing the new supplementary IR selection criterion for SySts in the VST/OmegaCAM Photometric H-Alpha Survey. Five of them turned out to be genuine SySts after the detection of H α, He ii, and [O iii] emission lines as well as TiO molecular bands. The characteristic O vi Raman-scattered line is also detected in one of these SySts. According to their IR colours and optical spectra, all five newly discovered SySts are classified as S-type. The high rate of true SySts detections of this work demonstrates that the combination of the H α emission and the new IR criterion improves the selection of target lists for follow-up observations by minimizing the number of contaminants and optimizing the observing time.

Download Full-text

Selecting between causal and noncausal models with quantile autoregressions

Studies in Nonlinear Dynamics & Econometrics ◽

10.1515/snde-2019-0044 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Alain Hecq ◽

Li Sun

Keyword(s):

Time Series ◽

Maximum Likelihood ◽

Latin American ◽

Financial Time Series ◽

Selection Criterion ◽

Likelihood Methods ◽

Approximate Maximum Likelihood ◽

Financial Time ◽

Maximum Likelihood Methods ◽

Latin American Countries

AbstractWe propose a model selection criterion to detect purely causal from purely noncausal models in the framework of quantile autoregressions (QAR). We also present asymptotics for the i.i.d. case with regularly varying distributed innovations in QAR. This new modelling perspective is appealing for investigating the presence of bubbles in economic and financial time series, and is an alternative to approximate maximum likelihood methods. We illustrate our analysis using hyperinflation episodes of Latin American countries.

Download Full-text

Monotonic Functions Method and Its Application to Staging of Patients with Prostate Cancer According to Pretreatment Data

Applied Sciences ◽

10.3390/app11093836 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3836

Author(s):

Valeri Gitis ◽

Alexander Derendyaev ◽

Konstantin Petrov ◽

Eugene Yurkov ◽

Sergey Pirogov ◽

...

Keyword(s):

Prostate Cancer ◽

Learning Algorithm ◽

Binary Classification ◽

Preoperative Staging ◽

Threshold Value ◽

Logical Function ◽

Adequate Treatment ◽

Monotonic Functions ◽

Selection Of

Prostate cancer is the second most frequent malignancy (after lung cancer). Preoperative staging of PCa is the basis for the selection of adequate treatment tactics. In particular, an urgent problem is the classification of indolent and aggressive forms of PCa in patients with the initial stages of the tumor process. To solve this problem, we propose to use a new binary classification machine-learning method. The proposed method of monotonic functions uses a model in which the disease’s form is determined by the severity of the patient’s condition. It is assumed that the patient’s condition is the easier, the less the deviation of the indicators from the normal values inherent in healthy people. This assumption means that the severity (form) of the disease can be represented by monotonic functions from the values of the deviation of the patient’s indicators beyond the normal range. The method is used to solve the problem of classifying patients with indolent and aggressive forms of prostate cancer according to pretreatment data. The learning algorithm is nonparametric. At the same time, it allows an explanation of the classification results in the form of a logical function. To do this, you should indicate to the algorithm either the threshold value of the probability of successful classification of patients with an indolent form of PCa, or the threshold value of the probability of misclassification of patients with an aggressive form of PCa disease. The examples of logical rules given in the article show that they are quite simple and can be easily interpreted in terms of preoperative indicators of the form of the disease.

Download Full-text

Regional Downscaling of Copernicus ERA5 Wave Data for Coastal Engineering Activities and Operational Coastal Services

Water ◽

10.3390/w13060859 ◽

2021 ◽

Vol 13 (6) ◽

pp. 859

Author(s):

Giorgio Bellotti ◽

Leopoldo Franco ◽

Claudia Cecioni

Keyword(s):

Time Series ◽

National Level ◽

Wave Model ◽

Wave Climate ◽

Tyrrhenian Sea ◽

Computational Domain ◽

Multiple Wave ◽

Space Resolution ◽

Wave Data ◽

Fundamental Information

Hindcasted wind and wave data, available on a coarse resolution global grid (Copernicus ERA5 dataset), are downscaled by means of the numerical model SWAN (simulating waves in the nearshore) to produce time series of wave conditions at a high resolution along the Italian coasts in the central Tyrrhenian Sea. In order to achieve the proper spatial resolution along the coast, the finite element version of the model is used. Wave data time series at the ERA5 grid are used to specify boundary conditions for the wave model at the offshore sides of the computational domain. The wind field is fed to the model to account for local wave generation. The modeled sea states are compared against the multiple wave records available in the area, in order to calibrate and validate the model. The model results are in quite good agreement with direct measurements, both in terms of wave climate and wave extremes. The results show that using the present modeling chain, it is possible to build a reliable nearshore wave parameters database with high space resolution. Such a database, once prepared for coastal areas, possibly at the national level, can be of high value for many engineering activities related to coastal area management, and can be useful to provide fundamental information for the development of operational coastal services.

Download Full-text

Shift Detection in Hydrological Regimes and Pluriannual Low-Frequency Streamflow Forecasting Using the Hidden Markov Model

Water ◽

10.3390/w12072058 ◽

2020 ◽

Vol 12 (7) ◽

pp. 2058 ◽

Cited By ~ 2

Author(s):

Larissa Rolim ◽

Francisco de Souza Filho

Keyword(s):

Time Series ◽

Markov Model ◽

Hidden Markov Model ◽

Extreme Events ◽

Hidden Markov ◽

Low Frequency ◽

Streamflow Forecasting ◽

Hydrologic Time Series ◽

Low Frequency Variability ◽

The Impact

Improved water resource management relies on accurate analyses of the past dynamics of hydrological variables. The presence of low-frequency structures in hydrologic time series is an important feature. It can modify the probability of extreme events occurring in different time scales, which makes the risk associated with extreme events dynamic, changing from one decade to another. This article proposes a methodology capable of dynamically detecting and predicting low-frequency streamflow (16–32 years), which presented significance in the wavelet power spectrum. The Standardized Runoff Index (SRI), the Pruned Exact Linear Time (PELT) algorithm, the breaks for additive seasonal and trend (BFAST) method, and the hidden Markov model (HMM) were used to identify the shifts in low frequency. The HMM was also used to forecast the low frequency. As part of the results, the regime shifts detected by the BFAST approach are not entirely consistent with results from the other methods. A common shift occurs in the mid-1980s and can be attributed to the construction of the reservoir. Climate variability modulates the streamflow low-frequency variability, and anthropogenic activities and climate change can modify this modulation. The identification of shifts reveals the impact of low frequency in the streamflow time series, showing that the low-frequency variability conditions the flows of a given year.

Download Full-text