Evaluation and Comparison of Random Forest and A-LSTM Networks for Large-scale Winter Wheat Identification

Tianle He; Chuanjie Xie; Qingsheng Liu; Shiying Guan; Gaohuan Liu

doi:10.3390/rs11141665

Evaluation and Comparison of Random Forest and A-LSTM Networks for Large-scale Winter Wheat Identification

Remote Sensing ◽

10.3390/rs11141665 ◽

2019 ◽

Vol 11 (14) ◽

pp. 1665 ◽

Cited By ~ 11

Author(s):

Tianle He ◽

Chuanjie Xie ◽

Qingsheng Liu ◽

Shiying Guan ◽

Gaohuan Liu

Keyword(s):

Time Series ◽

Random Forest ◽

Winter Wheat ◽

Large Scale ◽

Short Term Memory ◽

State Of The Art ◽

Training Sample ◽

Central China ◽

Surface Reflectance ◽

The Impact

Machine learning comprises a group of powerful state-of-the-art techniques for land cover classification and cropland identification. In this paper, we proposed and evaluated two models based on random forest (RF) and attention-based long short-term memory (A-LSTM) networks that can learn directly from the raw surface reflectance of remote sensing (RS) images for large-scale winter wheat identification in Huanghuaihai Region (North-Central China). We used a time series of Moderate Resolution Imaging Spectroradiometer (MODIS) images over one growing season and the corresponding winter wheat distribution map for the experiments. Each training sample was derived from the raw surface reflectance of MODIS time-series images. Both models achieved state-of-the-art performance in identifying winter wheat, and the F1 scores of RF and A-LSTM were 0.72 and 0.71, respectively. We also analyzed the impact of the pixel-mixing effect. Training with pure-mixed-pixel samples (the training set consists of pure and mixed cells and thus retains the original distribution of data) was more precise than training with only pure-pixel samples (the entire pixel area belongs to one class). We also analyzed the variable importance along the temporal series, and the data acquired in March or April contributed more than the data acquired at other times. Both models could predict winter wheat coverage in past years or in other regions with similar winter wheat growing seasons. The experiments in this paper showed the effectiveness and significance of our methods.

Download Full-text

SHEDR: An End-to-End Deep Neural Event Detection and Recommendation Framework for Hyperlocal News Using Social Media

INFORMS Journal on Computing ◽

10.1287/ijoc.2021.1112 ◽

2021 ◽

Author(s):

Yuheng Hu ◽

Yili Hong

Keyword(s):

Neural Network ◽

Social Media ◽

Deep Learning ◽

Event Detection ◽

Large Scale ◽

Short Term Memory ◽

State Of The Art ◽

Neural Network Models ◽

Neural Event ◽

End To End

Residents often rely on newspapers and television to gather hyperlocal news for community awareness and engagement. More recently, social media have emerged as an increasingly important source of hyperlocal news. Thus far, the literature on using social media to create desirable societal benefits, such as civic awareness and engagement, is still in its infancy. One key challenge in this research stream is to timely and accurately distill information from noisy social media data streams to community members. In this work, we develop SHEDR (social media–based hyperlocal event detection and recommendation), an end-to-end neural event detection and recommendation framework with a particular use case for Twitter to facilitate residents’ information seeking of hyperlocal events. The key model innovation in SHEDR lies in the design of the hyperlocal event detector and the event recommender. First, we harness the power of two popular deep neural network models, the convolutional neural network (CNN) and long short-term memory (LSTM), in a novel joint CNN-LSTM model to characterize spatiotemporal dependencies for capturing unusualness in a region of interest, which is classified as a hyperlocal event. Next, we develop a neural pairwise ranking algorithm for recommending detected hyperlocal events to residents based on their interests. To alleviate the sparsity issue and improve personalization, our algorithm incorporates several types of contextual information covering topic, social, and geographical proximities. We perform comprehensive evaluations based on two large-scale data sets comprising geotagged tweets covering Seattle and Chicago. We demonstrate the effectiveness of our framework in comparison with several state-of-the-art approaches. We show that our hyperlocal event detection and recommendation models consistently and significantly outperform other approaches in terms of precision, recall, and F-1 scores. Summary of Contribution: In this paper, we focus on a novel and important, yet largely underexplored application of computing—how to improve civic engagement in local neighborhoods via local news sharing and consumption based on social media feeds. To address this question, we propose two new computational and data-driven methods: (1) a deep learning–based hyperlocal event detection algorithm that scans spatially and temporally to detect hyperlocal events from geotagged Twitter feeds; and (2) A personalized deep learning–based hyperlocal event recommender system that systematically integrates several contextual cues such as topical, geographical, and social proximity to recommend the detected hyperlocal events to potential users. We conduct a series of experiments to examine our proposed models. The outcomes demonstrate that our algorithms are significantly better than the state-of-the-art models and can provide users with more relevant information about the local neighborhoods that they live in, which in turn may boost their community engagement.

Download Full-text

Estimating probability of banking crises using random forest

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp407-413 ◽

2021 ◽

Vol 10 (2) ◽

pp. 407

Author(s):

Sri Hartini ◽

Zuherman Rustam ◽

Glori Stephani Saragih ◽

María Jesús Segovia Vargas

Keyword(s):

Random Forest ◽

State Of The Art ◽

Banking Crises ◽

Training Data ◽

Annual Data ◽

Systemic Crisis ◽

Classification And Regression ◽

Systemic Crises ◽

The Impact ◽

Better Than

<span id="docs-internal-guid-4935b5ce-7fff-d9fa-75c7-0c6a5aa1f9a6"><span>Banks have a crucial role in the financial system. When many banks suffer from the crisis, it can lead to financial instability. According to the impact of the crises, the banking crisis can be divided into two categories, namely systemic and non-systemic crisis. When systemic crises happen, it may cause even stable banks bankrupt. Hence, this paper proposed a random forest for estimating the probability of banking crises as prevention action. Random forest is well-known as a robust technique both in classification and regression, which is far from the intervention of outliers and overfitting. The experiments were then constructed using the financial crisis database, containing a sample of 79 countries in the period 1981-1999 (annual data). This dataset has 521 samples consisting of 164 crisis samples and 357 non-crisis cases. From the experiments, it was concluded that utilizing 90 percent of training data would deliver 0.98 accuracy, 0.92 sensitivity, 1.00 precision, and 0.96 F1-Score as the highest score than other percentages of training data. These results are also better than state-of-the-art methods used in the same dataset. Therefore, the proposed method is shown promising results to predict the probability of banking crises.</span></span>

Download Full-text

InSAR Greece with Parallelized Persistent Scatterer Interferometry: A National Ground Motion Service for Big Copernicus Sentinel-1 Data

Remote Sensing ◽

10.3390/rs12193207 ◽

2020 ◽

Vol 12 (19) ◽

pp. 3207

Author(s):

Ioannis Papoutsis ◽

Charalampos Kontoes ◽

Stavroula Alatza ◽

Alexis Apostolakis ◽

Constantinos Loupasakis

Keyword(s):

Time Series ◽

Ground Motion ◽

Large Scale ◽

Persistent Scatterer Interferometry ◽

Persistent Scatterers ◽

Processing Chain ◽

Persistent Scatterer ◽

Basin Scale ◽

Wide Range ◽

The Impact

Advances in synthetic aperture radar (SAR) interferometry have enabled the seamless monitoring of the Earth’s crust deformation. The dense archive of the Sentinel-1 Copernicus mission provides unprecedented spatial and temporal coverage; however, time-series analysis of such big data volumes requires high computational efficiency. We present a parallelized-PSI (P-PSI), a novel, parallelized, and end-to-end processing chain for the fully automated assessment of line-of-sight ground velocities through persistent scatterer interferometry (PSI), tailored to scale to the vast multitemporal archive of Sentinel-1 data. P-PSI is designed to transparently access different and complementary Sentinel-1 repositories, and download the appropriate datasets for PSI. To make it efficient for large-scale applications, we re-engineered and parallelized interferogram creation and multitemporal interferometric processing, and introduced distributed implementations to best use computing cores and provide resourceful storage management. We propose a new algorithm to further enhance the processing efficiency, which establishes a non-uniform patch grid considering land use, based on the expected number of persistent scatterers. P-PSI achieves an overall speed-up by a factor of five for a full Sentinel-1 frame for processing in a 20-core server. The processing chain is tested on a large-scale project to calculate and monitor deformation patterns over the entire extent of the Greek territory—our own Interferometric SAR (InSAR) Greece project. Time-series InSAR analysis was performed on volumes of about 12 TB input data corresponding to more than 760 Single Look Complex Sentinel-1A and B images mostly covering mainland Greece in the period of 2015–2019. InSAR Greece provides detailed ground motion information on more than 12 million distinct locations, providing completely new insights into the impact of geophysical and anthropogenic activities at this geographic scale. This new information is critical to enhancing our understanding of the underlying mechanisms, providing valuable input into risk assessment models. We showcase this through the identification of various characteristic geohazard locations in Greece and discuss their criticality. The selected geohazard locations, among a thousand, cover a wide range of catastrophic events including landslides, land subsidence, and structural failures of various scales, ranging from a few hundredths of square meters up to the basin scale. The study enriches the large catalog of geophysical related phenomena maintained by the GeObservatory portal of the Center of Earth Observation Research and Satellite Remote Sensing BEYOND of the National Observatory of Athens for the opening of new knowledge to the wider scientific community.

Download Full-text

525. The Impact of Switching to Molecular Testing on Clostridium difficile Infection Rates: Large-Scale Assessment Using an Interrupted Time Series Poisson Regression Approach

Open Forum Infectious Diseases ◽

10.1093/ofid/ofy210.534 ◽

2018 ◽

Vol 5 (suppl_1) ◽

pp. S194-S194

Author(s):

Tiago Barbieri Couto Jabur ◽

Iulian Ilies ◽

Arthur W Baker ◽

Deverick J Anderson ◽

James Benneyan

Keyword(s):

Time Series ◽

Clostridium Difficile ◽

Poisson Regression ◽

Large Scale ◽

Interrupted Time Series ◽

Molecular Testing ◽

Infection Rates ◽

Large Scale Assessment ◽

Regression Approach ◽

The Impact

Download Full-text

External Groundwater Alleviates the Degradation of Closed Lakes in Semi-Arid Regions of China

Remote Sensing ◽

10.3390/rs12010045 ◽

2019 ◽

Vol 12 (1) ◽

pp. 45 ◽

Cited By ~ 2

Author(s):

Jiaqi Chen ◽

Jiming Lv ◽

Ning Li ◽

Qingwei Wang ◽

Jian Wang

Keyword(s):

Time Series ◽

Large Scale ◽

Isotopic Signature ◽

Water Infiltration ◽

Agricultural Irrigation ◽

Lake Area ◽

Daihai Lake ◽

Huangqihai Lake ◽

The Impact ◽

Semi Arid

There are a large number of lakes with beaded distribution in the semi-arid areas of the Inner Mongolian Plateau, and some of them have degraded or even disappeared during the past three decades. We studied the reasons of the disappearance of these lakes by determining the way of replenishment of these lakes and the impact of the natural-social environment of the basin, with the aim of saving these gradually disappearing lakes. Based on remote sensing image and hydrological analysis, this paper studied the recharge of Daihai Lake and Huangqihai Lake. The deep learning method was used to establish the time-series of lake evolution. The same method was combined with the innovative woodland and farmland extraction method to set up the time-series of ground classification composition in the basins. Using relevant survey data, combined with soil water infiltration test, water chemical, and isotopic signature analysis of various water bodies, we found that the Daihai Lake area is the largest in dry season and the smallest in rainy season and the other lake is not satisfied with this phenomenon. In addition, we calculated the specific recharge and consumption of the study basin. These experiments indicated that the exogenous groundwater is recharged directly through the faults at the bottom of Daihai Lake, while the exogenous groundwater is recharged in Huangqihai Lake through rivers indirectly. Large-scale exploitation of groundwater for agricultural irrigation and industrial production is the main cause of lake degradation. Reducing the extraction of groundwater for agricultural irrigation is an important measure to restore lake ecology.

Download Full-text

A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00202 ◽

2014 ◽

Vol 40 (4) ◽

pp. 837-881 ◽

Cited By ~ 20

Author(s):

Mohammad Taher Pilehvar ◽

Roberto Navigli

Keyword(s):

Large Scale ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Evaluation Framework ◽

Small Scale ◽

Word Sense ◽

Knowledge Based ◽

Depth Analysis ◽

Sense Disambiguation ◽

The Impact

The evaluation of several tasks in lexical semantics is often limited by the lack of large amounts of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a systems' performance. In this paper we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of state-of-the-art supervised and knowledge-based algorithms. Using this framework, we study the impact of supervision and knowledge on the two major disambiguation paradigms and perform an in-depth analysis of the factors which affect their performance.

Download Full-text

Improving runoff estimates from regional climate models: a performance analysis in Spain

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-9-175-2012 ◽

2012 ◽

Vol 9 (1) ◽

pp. 175-214

Author(s):

D. González-Zeas ◽

L. Garrote ◽

A. Iglesias ◽

A. Sordo-Ward

Keyword(s):

Time Series ◽

Large Scale ◽

Climate Models ◽

Regional Climate ◽

Regional Climate Models ◽

Hydrologic Model ◽

Direct Runoff ◽

Interpolation Methods ◽

Basin Scale ◽

The Impact

Abstract. An important aspect to assess the impact of climate change on water availability is to have monthly time series representative of the current situation. In this context, a simple methodology is presented for application in large-scale studies in regions where a properly calibrated hydrologic model is not available, using the output variables simulated by regional climate models (RCMs) of the European project PRUDENCE under current climate conditions (period 1961–1990). The methodology compares different interpolation methods and alternatives to generate annual times series that minimize the bias with respect to observed values. The objective is to identify the best alternative to obtain bias-corrected, monthly runoff time series from the output of RCM simulations. This study uses information from 338 basins in Spain that cover the entire mainland territory and whose observed values of naturalised runoff have been estimated by the distributed hydrological model SIMPA. Four interpolation methods for downscaling runoff to the basin scale from 10 RCMs are compared with emphasis on the ability of each method to reproduce the observed behavior of this variable. The alternatives consider the use of the direct runoff of the RCMs and the mean annual runoff calculated using five functional forms of the aridity index, defined as the ratio between potential evaporation and precipitation. In addition, the comparison with respect to the global runoff reference of the UNH/GRDC dataset is evaluated, as a contrast of the "best estimator" of current runoff on a large scale. Results show that the bias is minimised using the direct original interpolation method and the best alternative for bias correction of the monthly direct runoff time series of RCMs is the UNH/GRDC dataset, although the formula proposed by Schreiber also gives good results.

Download Full-text

Impact of solar variability on the frequency variability of the Indian summer monsoon

MAUSAM ◽

10.54302/mausam.v52i1.1678 ◽

2021 ◽

Vol 52 (1) ◽

pp. 67-82

Author(s):

J. R. KULKARNI ◽

M. MUJUMDAR ◽

S. P. GHARGE ◽

V. SATYAN ◽

G. B. PANT

Keyword(s):

Time Series ◽

Summer Monsoon ◽

Solar Irradiance ◽

General Circulation ◽

Large Scale ◽

Monsoon Season ◽

Circulation Model ◽

Solar Constant ◽

Good Correspondence ◽

The Impact

Earlier investigations into the epochal behavior of fluctuations in All India Summer Monsoon Rainfall (AISMR) have indicated the existence of a Low Frequency Mode (LFM) in the 60-70 years range. One of the probable sources of this variability may be due to changes in solar irradiance. To investigate this, time series of 128-year solar irradiance data from 1871-1998 has been examined. The Wavelet Transform (WT) method is applied to extract the LFM from these time series, which show a very good correspondence. A case study has been carried out to test the sensitivity of AISMR to solar irradiance. The General Circulation Model (GCM) of the Center of Ocean-Land-Atmosphere (COLA) has been integrated in the control run (using the climatological value of solar constant i.e., 1365 Wm-2) and in the enhanced solar constant condition (enhanced by 10 Wm-2) for summer monsoon season of 1986. The study shows that the large scale atmospheric circulation over the Indian region, in the enhanced solar constant scenario is favorable to good monsoon activity. A conceptual model for the impact of solar irradiance on the AISMR at LFM is also suggested.

Download Full-text

Prediction of cell penetrating peptides and their uptake efficiency using random forest-based feature selections

10.22541/au.163900203.30053403/v1 ◽

2021 ◽

Author(s):

Peng Liu ◽

Yijie Ding ◽

Ying Rong ◽

Dong Chen

Keyword(s):

Random Forest ◽

Large Scale ◽

Cross Validation ◽

Feature Vector ◽

State Of The Art ◽

Cell Penetrating Peptides ◽

Random Forest Algorithm ◽

Uptake Efficiency ◽

Cell Penetrating ◽

Feature Dimension

Cell penetrating peptides (CPPs) are short peptides that can carry biomolecules of varying sizes across the cell membrane into the cytoplasm. Correctly identifying CPPs is the basis for studying their functions and mechanisms. Here, we propose a novel CPP predictor that is able to predict CPPs and their uptake efficiency. In our method, five feature descriptors are applied to encode the sequence and compose a hybrid feature vector. Afterward, the wrapper + random forest algorithm is employed, which combines feature selection with the prediction process to find features that are crucial for identifying CPPs. The jackknife cross validation result shows that our predictor is comparable to state-of-the-art CPP predictors, and our method reduces the feature dimension, which improves computational efficiency and avoids overfitting, allowing our predictor to be adopted to identify large-scale CPP data.

Download Full-text

Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series

Remote Sensing ◽

10.3390/rs13224599 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4599

Author(s):

Félix Quinton ◽

Loic Landrieu

Keyword(s):

Time Series ◽

Deep Learning ◽

Crop Rotation ◽

Large Scale ◽

State Of The Art ◽

Crop Rotations ◽

Learning Approach ◽

Type Mapping ◽

Current State ◽

Crop Type

While annual crop rotations play a crucial role for agricultural optimization, they have been largely ignored for automated crop type mapping. In this paper, we take advantage of the increasing quantity of annotated satellite data to propose to model simultaneously the inter- and intra-annual agricultural dynamics of yearly parcel classification with a deep learning approach. Along with simple training adjustments, our model provides an improvement of over 6.3% mIoU over the current state-of-the-art of crop classification, and a reduction of over 21% of the error rate. Furthermore, we release the first large-scale multi-year agricultural dataset with over 300,000 annotated parcels.

Download Full-text