scholarly journals A Comparative Assessment of Ensemble-Based Machine Learning and Maximum Likelihood Methods for Mapping Seagrass Using Sentinel-2 Imagery in Tauranga Harbor, New Zealand

2020 ◽  
Vol 12 (3) ◽  
pp. 355 ◽  
Author(s):  
Nam Thang Ha ◽  
Merilyn Manley-Harris ◽  
Tien Dat Pham ◽  
Ian Hawes

Seagrass has been acknowledged as a productive blue carbon ecosystem that is in significant decline across much of the world. A first step toward conservation is the mapping and monitoring of extant seagrass meadows. Several methods are currently in use, but mapping the resource from satellite images using machine learning is not widely applied, despite its successful use in various comparable applications. This research aimed to develop a novel approach for seagrass monitoring using state-of-the-art machine learning with data from Sentinel–2 imagery. We used Tauranga Harbor, New Zealand as a validation site for which extensive ground truth data are available to compare ensemble machine learning methods involving random forests (RF), rotation forests (RoF), and canonical correlation forests (CCF) with the more traditional maximum likelihood classifier (MLC) technique. Using a group of validation metrics including F1, precision, recall, accuracy, and the McNemar test, our results indicated that machine learning techniques outperformed the MLC with RoF as the best performer (F1 scores ranging from 0.75–0.91 for sparse and dense seagrass meadows, respectively). Our study is the first comparison of various ensemble-based methods for seagrass mapping of which we are aware, and promises to be an effective approach to enhance the accuracy of seagrass monitoring.

Author(s):  
◽  
S. S. Ray

<p><strong>Abstract.</strong> Crop Classification and recognition is a very important application of Remote Sensing. In the last few years, Machine learning classification techniques have been emerging for crop classification. Google Earth Engine (GEE) is a platform to explore the multiple satellite data with different advanced classification techniques without even downloading the satellite data. The main objective of this study is to explore the ability of different machine learning classification techniques like, Random Forest (RF), Classification And Regression Trees (CART) and Support Vector Machine (SVM) for crop classification. High Resolution optical data, Sentinel-2, MSI (10&amp;thinsp;m) was used for crop classification in the Indian Agricultural Research Institute (IARI) farm for the Rabi season 2016 for major crops. Around 100 crop fields (~400 Hectare) in IARI were analysed. Smart phone-based ground truth data were collected. The best cloud free image of Sentinel 2 MSI data (5 Feb 2016) was used for classification using automatic filtering by percentage cloud cover property using the GEE. Polygons as feature space was used as training data sets based on the ground truth data for crop classification using machine learning techniques. Post classification, accuracy assessment analysis was done through the generation of the confusion matrix (producer and user accuracy), kappa coefficient and F value. In this study it was found that using GEE through cloud platform, satellite data accessing, filtering and pre-processing of satellite data could be done very efficiently. In terms of overall classification accuracy and kappa coefficient, Random Forest (93.3%, 0.9178) and CART (73.4%, 0.6755) classifiers performed better than SVM (74.3%, 0.6867) classifier. For validation, Field Operation Service Unit (FOSU) division of IARI, data was used and encouraging results were obtained.</p>


2021 ◽  
Author(s):  
Michael Tarasiou

This paper presents DeepSatData a pipeline for automatically generating satellite imagery datasets for training machine learning models. We also discuss design considerations with emphasis on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows the generation of large scale datasets required for training deep neural networks (DNN). We discuss issues faced from the point of view of DNN training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach.


2021 ◽  
Author(s):  
Michael Tarasiou

This paper presents DeepSatData a pipeline for automatically generating satellite imagery datasets for training machine learning models. We also discuss design considerations with emphasis on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows the generation of large scale datasets required for training deep neural networks (DNN). We discuss issues faced from the point of view of DNN training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Adèle Weber Zendrera ◽  
Nataliya Sokolovska ◽  
Hédi A. Soula

AbstractIn this manuscript, we propose a novel approach to assess relationships between environment and metabolic networks. We used a comprehensive dataset of more than 5000 prokaryotic species from which we derived the metabolic networks. We compute the scope from the reconstructed graphs, which is the set of all metabolites and reactions that can potentially be synthesized when provided with external metabolites. We show using machine learning techniques that the scope is an excellent predictor of taxonomic and environmental variables, namely growth temperature, oxygen tolerance, and habitat. In the literature, metabolites and pathways are rarely used to discriminate species. We make use of the scope underlying structure—metabolites and pathways—to construct the predictive models, giving additional information on the important metabolic pathways needed to discriminate the species, which is often absent in other metabolic network properties. For example, in the particular case of growth temperature, glutathione biosynthesis pathways are specific to species growing in cold environments, whereas tungsten metabolism is specific to species in warm environments, as was hinted in current literature. From a machine learning perspective, the scope is able to reduce the dimension of our data, and can thus be considered as an interpretable graph embedding.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 01) ◽  
pp. 183-195
Author(s):  
Thingbaijam Lenin ◽  
N. Chandrasekaran

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.


2021 ◽  
Vol 14 (6) ◽  
pp. 997-1005
Author(s):  
Sandeep Tata ◽  
Navneet Potti ◽  
James B. Wendt ◽  
Lauro Beltrão Costa ◽  
Marc Najork ◽  
...  

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.


Inventions ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 57
Author(s):  
Attique Ur Rehman ◽  
Tek Tjing Lie ◽  
Brice Vallès ◽  
Shafiqur Rahman Tito

The recent advancement in computational capabilities and deployment of smart meters have caused non-intrusive load monitoring to revive itself as one of the promising techniques of energy monitoring. Toward effective energy monitoring, this paper presents a non-invasive load inference approach assisted by feature selection and ensemble machine learning techniques. For evaluation and validation purposes of the proposed approach, one of the major residential load elements having solid potential toward energy efficiency applications, i.e., water heating, is considered. Moreover, to realize the real-life deployment, digital simulations are carried out on low-sampling real-world load measurements: New Zealand GREEN Grid Database. For said purposes, MATLAB and Python (Scikit-Learn) are used as simulation tools. The employed learning models, i.e., standalone and ensemble, are trained on a single household’s load data and later tested rigorously on a set of diverse households’ load data, to validate the generalization capability of the employed models. This paper presents a comprehensive performance evaluation of the presented approach in the context of event detection, feature selection, and learning models. Based on the presented study and corresponding analysis of the results, it is concluded that the proposed approach generalizes well to the unseen testing data and yields promising results in terms of non-invasive load inference.


Sign in / Sign up

Export Citation Format

Share Document