Downscaling TRMM Monthly Precipitation Using Google Earth Engine and Google Cloud Computing

Abdelrazek Elnashar; Hongwei Zeng; Bingfang Wu; Ning Zhang; Fuyou Tian; Miao Zhang; Weiwei Zhu; Nana Yan; Zeqiang Chen; Zhiyu Sun; Xinghua Wu; Yuan Li

doi:10.3390/rs12233860

Downscaling TRMM Monthly Precipitation Using Google Earth Engine and Google Cloud Computing

Remote Sensing ◽

10.3390/rs12233860 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3860

Author(s):

Abdelrazek Elnashar ◽

Hongwei Zeng ◽

Bingfang Wu ◽

Ning Zhang ◽

Fuyou Tian ◽

...

Keyword(s):

Machine Learning ◽

Vegetation Index ◽

Google Earth ◽

Upstream Region ◽

Support Vector ◽

Monthly Precipitation ◽

Spatiotemporal Resolution ◽

High Spatiotemporal Resolution ◽

Trmm Precipitation ◽

Google Earth Engine

Accurate precipitation data at high spatiotemporal resolution are critical for land and water management at the basin scale. We proposed a downscaling framework for Tropical Rainfall Measuring Mission (TRMM) precipitation products through integrating Google Earth Engine (GEE) and Google Colaboratory (Colab). Three machine learning methods, including Gradient Boosting Regressor (GBR), Support Vector Regressor (SVR), and Artificial Neural Network (ANN) were compared in the framework. Three vegetation indices (Normalized Difference Vegetation Index, NDVI; Enhanced Vegetation Index, EVI; Leaf Area Index, LAI), topography, and geolocation are selected as geospatial predictors to perform the downscaling. This framework can automatically optimize the models’ parameters, estimate features’ importance, and downscale the TRMM product to 1 km. The spatial downscaling of TRMM from 25 km to 1 km was achieved by using the relationships between annual precipitations and annually-averaged vegetation index. The monthly precipitation maps derived from the annual downscaled precipitation by disaggregation. According to validation in the Great Mekong upstream region, the ANN yielded the best performance when simulating the annual TRMM precipitation. The most sensitive vegetation index for downscaling TRMM was LAI, followed by EVI. Compared with existing downscaling methods, the proposed framework for downscaling TRMM can be performed online for any given region using a wide range of machine learning tools and environmental variables to generate a precipitation product with high spatiotemporal resolution.

Download Full-text

Object-Oriented LULC Classification in Google Earth Engine Combining SNIC, GLCM, and Machine Learning Algorithms

Remote Sensing ◽

10.3390/rs12223776 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3776

Author(s):

Andrea Tassi ◽

Marco Vizzari

Keyword(s):

Machine Learning ◽

Central Italy ◽

Object Oriented ◽

Google Earth ◽

Machine Learning Algorithms ◽

Support Vector ◽

Landsat 8 ◽

Good Reliability ◽

Google Earth Engine ◽

Occurrence Matrix

Google Earth Engine (GEE) is a versatile cloud platform in which pixel-based (PB) and object-oriented (OO) Land Use–Land Cover (LULC) classification approaches can be implemented, thanks to the availability of the many state-of-art functions comprising various Machine Learning (ML) algorithms. OO approaches, including both object segmentation and object textural analysis, are still not common in the GEE environment, probably due to the difficulties existing in concatenating the proper functions, and in tuning the various parameters to overcome the GEE computational limits. In this context, this work is aimed at developing and testing an OO classification approach combining the Simple Non-Iterative Clustering (SNIC) algorithm to identify spatial clusters, the Gray-Level Co-occurrence Matrix (GLCM) to calculate cluster textural indices, and two ML algorithms (Random Forest (RF) or Support Vector Machine (SVM)) to perform the final classification. A Principal Components Analysis (PCA) is applied to the main seven GLCM indices to synthesize in one band the textural information used for the OO classification. The proposed approach is implemented in a user-friendly, freely available GEE code useful to perform the OO classification, tuning various parameters (e.g., choose the input bands, select the classification algorithm, test various segmentation scales) and compare it with a PB approach. The accuracy of OO and PB classifications can be assessed both visually and through two confusion matrices that can be used to calculate the relevant statistics (producer’s, user’s, overall accuracy (OA)). The proposed methodology was broadly tested in a 154 km2 study area, located in the Lake Trasimeno area (central Italy), using Landsat 8 (L8), Sentinel 2 (S2), and PlanetScope (PS) data. The area was selected considering its complex LULC mosaic mainly composed of artificial surfaces, annual and permanent crops, small lakes, and wooded areas. In the study area, the various tests produced interesting results on the different datasets (OA: PB RF (L8 = 72.7%, S2 = 82%, PS = 74.2), PB SVM (L8 = 79.1%, S2 = 80.2%, PS = 74.8%), OO RF (L8 = 64%, S2 = 89.3%, PS = 77.9), OO SVM (L8 = 70.4, S2 = 86.9%, PS = 73.9)). The broad code application demonstrated very good reliability of the whole process, even though the OO classification process resulted, sometimes, too demanding on higher resolution data, considering the available computational GEE resources.

Download Full-text

Mapping Regional Soil Organic Matter Based on Sentinel-2A and MODIS Imagery Using Machine Learning Algorithms and Google Earth Engine

Remote Sensing ◽

10.3390/rs13152934 ◽

2021 ◽

Vol 13 (15) ◽

pp. 2934

Author(s):

Meiwei Zhang ◽

Meinan Zhang ◽

Haoxuan Yang ◽

Yuanliang Jin ◽

Xinle Zhang ◽

...

Keyword(s):

Machine Learning ◽

Organic Matter ◽

Soil Organic Matter ◽

Learning Algorithms ◽

Google Earth ◽

Machine Learning Algorithms ◽

Support Vector ◽

Full Band ◽

Google Earth Engine ◽

Sentinel 2A

Many studies have attempted to predict soil organic matter (SOM), whereas mapping high-precision and high-resolution SOM maps remains a challenge due to the difficulty of selecting appropriate satellite data sources and prediction algorithms. This study aimed to investigate the influence of different remotely sensed images and machine learning algorithms on SOM prediction. We constructed two comparative experiments, i.e., full-band and common-band variable datasets of Sentinel-2A and MODIS images using Google Earth Engine (GEE). The predictive performances of random forest (RF), artificial neural network (ANN), and support vector regression (SVR) algorithms were evaluated, and the SOM map was generated for the Songnen Plain. Results showed that the model based on the full-band Sentinel-2A dataset achieved the best performance. The application of Sentinel-2A data resulted in mean relative improvements (RIs) of 7.67% and 5.87%, respectively. The RF achieved a lower root mean squared error (RMSE = 0.68%) and a higher coefficient of determination (R2 = 0.67) in all of the predicted scenarios than ANN and SVR. The resultant SOM map accurately characterized the SOM spatial distribution. Therefore, the Sentinel-2A data have obvious advantages over MODIS due to their higher spectral and spatial resolutions, and the combination of the RF algorithm and GEE is an effective approach to SOM mapping.

Download Full-text

Mapping Paddy Rice Fields by Combining Multi-Temporal Vegetation Index and Synthetic Aperture Radar Remote Sensing Data Using Google Earth Engine Machine Learning Platform

Remote Sensing ◽

10.3390/rs12182992 ◽

2020 ◽

Vol 12 (18) ◽

pp. 2992 ◽

Cited By ~ 1

Author(s):

Nengcheng Chen ◽

Lixiaona Yu ◽

Xiang Zhang ◽

Yonglin Shen ◽

Linglin Zeng ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Synthetic Aperture Radar ◽

Vegetation Index ◽

Vegetation Indices ◽

Google Earth ◽

Paddy Rice ◽

Synthetic Aperture ◽

Google Earth Engine ◽

Data Missing

The knowledge of the area and spatial distribution of paddy rice fields is important for water resource management. However, accurate map of paddy rice is a long-term challenge because of its spatiotemporal discontinuity and short duration. To solve this problem, this study proposed a paddy rice area extraction approach by using the combination of optical vegetation indices and synthetic aperture radar (SAR) data. This method is designed to overcome the data-missing problem due to cloud contamination and spatiotemporal discontinuities of the traditional optical remote sensing method. More specifically, the Sentinel-1A SAR and the Sentinel-2 multispectral imager (MSI) Level-2A imagery are used to identify paddy rice with a high temporal and spatial resolution. Three vegetation indices, namely normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), and land surface water index (LSWI), are estimated from optical bands. Two polarization bands (VH (vertical-horizontal) and VV (vertical-vertical)) are used to overcome the cloud contamination problem. This approach was applied with the random forest machine learning algorithm on the Google Earth Engine platform for the Jianghan Plain in China as an experimental area. The results of 39 experiments uncovered the effect of different factors. The results indicated that the combination of VV and VH band showed a better performance compared with other polarization bands; the average producer’s accuracy of paddy rice (PA) is 72.79%, 1.58% higher than the second one VH. Secondly, the combination of three indices also showed a better result than others, with average PA 73.82%, 1.42% higher than using NDVI alone. The classification result presented the best combination is EVI, VV, and VH polarization band. The producer’s accuracy of paddy rice was 76.67%, with the overall accuracy (OA) of 66.07%, and Kappa statistics of 0.45. However, NDVI, EVI, and VH showed better performance in mapping the morphology. The results demonstrated the method developed in this study can be successfully applied to the cloud-prone area for mapping paddy rice to overcome the data missing caused by cloud and rain during the paddy growing season.

Download Full-text

Assessing the Effect of Training Sampling Design on the Performance of Machine Learning Classifiers for Land Cover Mapping Using Multi-Temporal Remote Sensing Data and Google Earth Engine

Remote Sensing ◽

10.3390/rs13081433 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1433

Author(s):

Shobitha Shetty ◽

Prasun Kumar Gupta ◽

Mariana Belgiu ◽

S. K. Srivastav

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Random Sampling ◽

Sampling Design ◽

Remote Sensing Data ◽

Google Earth ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Multi Temporal ◽

Google Earth Engine

Machine learning classifiers are being increasingly used nowadays for Land Use and Land Cover (LULC) mapping from remote sensing images. However, arriving at the right choice of classifier requires understanding the main factors influencing their performance. The present study investigated firstly the effect of training sampling design on the classification results obtained by Random Forest (RF) classifier and, secondly, it compared its performance with other machine learning classifiers for LULC mapping using multi-temporal satellite remote sensing data and the Google Earth Engine (GEE) platform. We evaluated the impact of three sampling methods, namely Stratified Equal Random Sampling (SRS(Eq)), Stratified Proportional Random Sampling (SRS(Prop)), and Stratified Systematic Sampling (SSS) upon the classification results obtained by the RF trained LULC model. Our results showed that the SRS(Prop) method favors major classes while achieving good overall accuracy. The SRS(Eq) method provides good class-level accuracies, even for minority classes, whereas the SSS method performs well for areas with large intra-class variability. Toward evaluating the performance of machine learning classifiers, RF outperformed Classification and Regression Trees (CART), Support Vector Machine (SVM), and Relevance Vector Machine (RVM) with a >95% confidence level. The performance of CART and SVM classifiers were found to be similar. RVM achieved good classification results with a limited number of training samples.

Download Full-text

Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine

Remote Sensing ◽

10.3390/rs13010010 ◽

2020 ◽

Vol 13 (1) ◽

pp. 10

Author(s):

Andrea Sulova ◽

Jamal Jokar Arsanjani

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

Google Earth ◽

Summer Season ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Training Dataset ◽

Google Earth Engine

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.

Download Full-text

Machine Learning Comparison and Parameter Setting Methods for the Detection of Dump Sites for Construction and Demolition Waste Using the Google Earth Engine

Remote Sensing ◽

10.3390/rs13040787 ◽

2021 ◽

Vol 13 (4) ◽

pp. 787

Author(s):

Lei Zhou ◽

Ting Luo ◽

Mingyi Du ◽

Qiang Chen ◽

Yang Liu ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Google Earth ◽

Construction And Demolition Waste ◽

Parameterization Scheme ◽

Classification Methods ◽

Demolition Waste ◽

Optimal Method ◽

Identification Method ◽

Google Earth Engine

Machine learning has been successfully used for object recognition within images. Due to the complexity of the spectrum and texture of construction and demolition waste (C&DW), it is difficult to construct an automatic identification method for C&DW based on machine learning and remote sensing data sources. Machine learning includes many types of algorithms; however, different algorithms and parameters have different identification effects on C&DW. Exploring the optimal method for automatic remote sensing identification of C&DW is an important approach for the intelligent supervision of C&DW. This study investigates the megacity of Beijing, which is facing high risk of C&DW pollution. To improve the classification accuracy of C&DW, buildings, vegetation, water, and crops were selected as comparative training samples based on the Google Earth Engine (GEE), and Sentinel-2 was used as the data source. Three classification methods of typical machine learning algorithms (classification and regression trees (CART), random forest (RF), and support vector machine (SVM)) were selected to classify the C&DW from remote sensing images. Using empirical methods, the experimental trial method, and the grid search method, the optimal parameterization scheme of the three classification methods was studied to determine the optimal method of remote sensing identification of C&DW based on machine learning. Through accuracy evaluation and ground verification, the overall recognition accuracies of CART, RF, and SVM for C&DW were 73.12%, 98.05%, and 85.62%, respectively, under the optimal parameterization scheme determined in this study. Among these algorithms, RF was a better C&DW identification method than were CART and SVM when the number of decision trees was 50. This study explores the robust machine learning method for automatic remote sensing identification of C&DW and provides a scientific basis for intelligent supervision and resource utilization of C&DW.

Download Full-text

A Semi-Automated Object-Based Gully Networks Detection Using Different Machine Learning Models: A Case Study of Bowen Catchment, Queensland, Australia

Sensors ◽

10.3390/s19224893 ◽

2019 ◽

Vol 19 (22) ◽

pp. 4893 ◽

Cited By ~ 22

Author(s):

Hejar Shahabi ◽

Ben Jarihani ◽

Sepideh Tavakkoli Piralilou ◽

David Chittleborough ◽

Mohammadtaghi Avand ◽

...

Keyword(s):

Machine Learning ◽

Image Segmentation ◽

Vegetation Index ◽

Scale Parameter ◽

Slope Aspect ◽

Support Vector ◽

Topographic Wetness Index ◽

Slope Length ◽

Optimal Scale ◽

Object Based

Gully erosion is a dominant source of sediment and particulates to the Great Barrier Reef (GBR) World Heritage area. We selected the Bowen catchment, a tributary of the Burdekin Basin, as our area of study; the region is associated with a high density of gully networks. We aimed to use a semi-automated object-based gully networks detection process using a combination of multi-source and multi-scale remote sensing and ground-based data. An advanced approach was employed by integrating geographic object-based image analysis (GEOBIA) with current machine learning (ML) models. These included artificial neural networks (ANN), support vector machines (SVM), and random forests (RF), and an ensemble ML model of stacking to deal with the spatial scaling problem in gully networks detection. Spectral indices such as the normalized difference vegetation index (NDVI) and topographic conditioning factors, such as elevation, slope, aspect, topographic wetness index (TWI), slope length (SL), and curvature, were generated from Sentinel 2A images and the ALOS 12-m digital elevation model (DEM), respectively. For image segmentation, the ESP2 tool was used to obtain three optimal scale factors. On using object pureness index (OPI), object matching index (OMI), and object fitness index (OFI), the accuracy of each scale in image segmentation was evaluated. The scale parameter of 45 with OFI of 0.94, which is a combination of OPI and OMI indices, proved to be the optimal scale parameter for image segmentation. Furthermore, segmented objects based on scale 45 were overlaid with 70% and 30% of a prepared gully inventory map to select the ML models’ training and testing objects, respectively. The quantitative accuracy assessment methods of Precision, Recall, and an F1 measure were used to evaluate the model’s performance. Integration of GEOBIA with the stacking model using a scale of 45 resulted in the highest accuracy in detection of gully networks with an F1 measure value of 0.89. Here, we conclude that the adoption of optimal scale object definition in the GEOBIA and application of the ensemble stacking of ML models resulted in higher accuracy in the detection of gully networks.

Download Full-text

PENERAPAN MACHINE LEARNING BERBASIS DATA GEOSPASIAL UNTUK OPTIMALISASI LAHAN PERTANIAN PADA MASA PANDEMI DAN PASCA PANDEMI

Seminar Nasional Geomatika ◽

10.24895/sng.2020.0-0.1131 ◽

2021 ◽

pp. 161

Author(s):

Royyannuur Kurniawan Endrayanto ◽

Adharul Muttaqin

Keyword(s):

Machine Learning ◽

Random Forest ◽

Early Warning Systems ◽

Google Earth ◽

Warning Systems ◽

Land Data Assimilation ◽

Google Earth Engine ◽

Land Data Assimilation System ◽

Data Assimilation System ◽

Assimilation System

Pertanian merupakan salah satu sektor penting karena dapat memenuhi kebutuhan pangan sebagai kebutuhan pokok. Kebutuhan pangan masih menjadi salah satu isu hangat terlebih di masa pandemi COVID- 19 seperti saat ini. Pemenuhan kebutuhan pangan juga berkaitan erat dengan jumlah bahan pangan yang diproduksi oleh petani. Lingkungan merupakan salah satu faktor keberhasilan dalam kegiatan pertanian. Kondisi lingkungan Indonesia yang beragam seperti suhu dan tingkat presipitasi menyebabkan adanya perbedaan jenis tanaman pangan potensial setiap daerah di Indonesia. Oleh karena itu perlu upaya untuk mengoptimalkan produksi lahan pertanian berdasarkan faktor lingkungan di setiap daerah. Upaya ini diharapkan dapat membantu menjaga ketahanan pangan baik di masa pandemi dan pasca pandemi. Pada penelitian ini diperkenalkan pemanfaatan data geospasial untuk klasifikasi jenis tanaman pangan menggunakan algoritma machine learning sebagai upaya optimalisasi lahan pertanian. Data yang digunakan adalah Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System (FLDAS). Algoritma machine learning yang digunakan adalah algoritma klasifikasi Random Forest. Teknologi yang digunakan adalah Google Colab, Google Earth Engine dan Python. Tujuan dari penelitian ini adalah untuk mengklasifikasikan tanaman pangan yang memiliki potensi paling baik untuk ditanam di suatu daerah berdasarkan kondisi lingkungan yang ada.

Download Full-text

Vegetation Types Mapping Using Multi-Temporal Landsat Images in the Google Earth Engine Platform

Remote Sensing ◽

10.3390/rs13224683 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4683

Author(s):

Masoumeh Aghababaei ◽

Ataollah Ebrahimi ◽

Ali Asghar Naghipour ◽

Esmaeil Asadi ◽

Jochem Verrelst

Keyword(s):

Vegetation Index ◽

Google Earth ◽

Optimal Time ◽

Quantitative Detection ◽

Landsat 8 ◽

Vegetation Types ◽

Landsat Images ◽

Multi Temporal ◽

Google Earth Engine ◽

Land Covers

Vegetation Types (VTs) are important managerial units, and their identification serves as essential tools for the conservation of land covers. Despite a long history of Earth observation applications to assess and monitor land covers, the quantitative detection of sparse VTs remains problematic, especially in arid and semiarid areas. This research aimed to identify appropriate multi-temporal datasets to improve the accuracy of VTs classification in a heterogeneous landscape in Central Zagros, Iran. To do so, first the Normalized Difference Vegetation Index (NDVI) temporal profile of each VT was identified in the study area for the period of 2018, 2019, and 2020. This data revealed strong seasonal phenological patterns and key periods of VTs separation. It led us to select the optimal time series images to be used in the VTs classification. We then compared single-date and multi-temporal datasets of Landsat 8 images within the Google Earth Engine (GEE) platform as the input to the Random Forest classifier for VTs detection. The single-date classification gave a median Overall Kappa (OK) and Overall Accuracy (OA) of 51% and 64%, respectively. Instead, using multi-temporal images led to an overall kappa accuracy of 74% and an overall accuracy of 81%. Thus, the exploitation of multi-temporal datasets favored accurate VTs classification. In addition, the presented results underline that available open access cloud-computing platforms such as the GEE facilitates identifying optimal periods and multitemporal imagery for VTs classification.

Download Full-text

Mapeamento da Vegetação Nativa do Cerrado na Região de Três Lagoas-MS com o Google Earth Engine

Revista Brasileira de Cartografia ◽

10.14393/rbcv71n3-47461 ◽

2019 ◽

Vol 71 (3) ◽

pp. 702-725

Author(s):

Nayara Vasconcelos Estrabis ◽

José Marcato Junior ◽

Hemerson Pistori

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Google Earth ◽

Support Vector ◽

Landsat 8 ◽

Landsat 8 Oli ◽

Google Earth Engine

O Cerrado é um dos biomas existentes no Brasil e o segundo mais extenso da América do Sul. Possui grande importância devido a sua biodiversidade, ecossistema e principalmente por servir como um reservatório, ou “esponja”, que distribui água para os demais biomas, além de ser berço de nascentes de algumas das maiores bacias da América do Sul. No entanto, devido às atividades antrópicas praticadas (com destaque para a pecuária e silvicultura) e a redução da vegetação nativa, este bioma está ameaçado. Considerado como hotspot em biodiversidade, o Cerrado pode não existir em 2050. Com a necessidade de sua preservação, o objetivo desse trabalho consistiu em investigar o uso de algoritmos de aprendizado de máquina para realizar o mapeamento da vegetação nativa existente na região do município de Três Lagoas, utilizando a plataforma em nuvem Google Earth Engine. O processo foi realizado com uma imagem Landsat-8 OLI, datada de 10 de outubro de 2018, e com os algoritmos Random Forest (RF) e Support Vector Machine (SVM). Na validação da classificação, o RF e o SVM apresentaram índices kappa iguais a 0,94 e 0,97, respectivamente. O RF, quando comparado ao SVM, apresentou classificação mais ruidosa. Por fim, verificou-se a existência de vegetação nativa de aproximadamente 2556 km² ao adotar o RF e 2873 km² ao adotar SVM.

Download Full-text