Improving the Quality of Satellite Imagery Based on Ground-Truth Data from Rain Gauge Stations

Monica M. Cole (Bedford College, London, U. K.). In contributing to a discussion of the use of multispectral satellite imagery in the exploration for petroleum and minerals covered by Mr Peters I wish to emphasize four points, some of which are relevant also to statements made by Dr Curran in his presentation. The first point is that remotely sensed imagery is a tool and its interpretation a technique to be used as appropriate and integrated with other techniques in mineral exploration. Mr Peters has reviewed the potential of multispectral satellite imagery and emphasized its value in initial reconnaissance studies notably for the identification of geological structures and lithologies. I would emphasize also its value at more advanced stages of exploration when reinterpretation of imagery at large scales and with reference to ground truth data can yield valuable information. My second point, which follows naturally from the first, is that effective interpretation of remotely sensed imagery requires an appreciation of the geographical environment as well as the geological environment. It is reflectances from the components of the geographical environment that produce the colours and tones seen on the colour composites generated from Landsat imagery. Except in arid areas largely devoid of plant cover, in natural terrain reflectances from vegetation dominate over those from soils and bedrock. Their contribution increases with increasing density of cover. The reflectances from different types of vegetation and from individual plant species, however, vary greatly, depending on the geometry of the canopy, the colour of foliage, the size, shape, angle, etc., of leaves, and the turgidity, water content and nutrient status of leaf cells. It is the differences in vegetation cover producing differing reflectances that permit the discrimination of lithologies and identification of structures on colour composites generated from Landsat imagery. In some areas, however, any or all of relict laterite, superficial cover, former and ephemeral drainage systems, and other physiographic features that are the legacies of geomorphological processes, complicate relations. These need to be understood for effective evaluation of imagery for geological purposes. In this context there is no substitute for field investigations, which are essential for the acquisition of ground truth data needed for effective evaluation of imagery.

Download Full-text

A Data Mining Approach to Rainfall Intensity Classification Using TRMM/TMI Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2008.p0516 ◽

2008 ◽

Vol 12 (6) ◽

pp. 516-522 ◽

Cited By ~ 2

Author(s):

Shan-Tai Chen ◽

◽

Shung-Lin Dou ◽

Wann-Jin Chen ◽

◽

...

Keyword(s):

Data Mining ◽

Rainfall Intensity ◽

Tropical Rainfall Measuring Mission ◽

Ground Truth ◽

Rain Gauge ◽

Heterogeneous Data ◽

Self Organizing Maps ◽

Ground Truth Data ◽

Model Training ◽

Microwave Imager

The systematic approach we propose for classifying oceanic rainfall intensity during the typhoon season consists of two major steps – 1) identifying the rain areas and 2) classifying rainfall intensity intonormalandheavyfor these areas. The heterogeneous hierarchical classifier (HHC), an ensemble model we developed for accurately identifying heavy rainfall events, consists of a set of base classifiers. The base classifiers are independently constructed through heterogeneous data mining approaches such as artificial neural networks, decision trees, and self-organizing maps. The meteorological satellite Tropical Rainfall Measuring Mission (TRMM) microwave imager (TMI) data from 2000 to 2005 are used to create the classification models. TRMM precipitation radar (PR) data and rain gauge data from Automatic Rainfall and Meteorological Telemetry System (ARMTS) measurement are used as ground truth data to evaluate models. Two thirds of the dataset is used for model training and one third for testing. Experimental results show that the proposed model classifies rainfall intensity highly accurately and outperforms previously published methods.

Download Full-text

Tablet-Based Traffic Counting Application Designed to Minimize Human Error

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2339-05 ◽

2013 ◽

Vol 2339 (1) ◽

pp. 39-46 ◽

Cited By ~ 4

Author(s):

Christopher Toth ◽

Wonho Suh ◽

Vetri Elango ◽

Ramik Sadana ◽

Angshuman Guin ◽

...

Keyword(s):

Count Data ◽

Transportation Planning ◽

Human Error ◽

Ground Truth ◽

Model Verification ◽

Freeway Traffic ◽

Ground Truth Data ◽

Traffic Count ◽

System Verification

Basic traffic counts are among the key elements in transportation planning and forecasting. As emerging data collection technologies proliferate, the availability of traffic count data will expand by orders of magnitude. However, availability of data does not always guarantee data accuracy, and it is essential that observed data are compared with ground truth data. Little research or guidance is available that ensures the quality of ground truth data with which the count results of automated technologies can be compared. To address the issue of ground truth data based on manual counts, a manual traffic counting application was developed for an Android tablet. Unlike other manual count applications, this application allows data collectors to replay and toggle through the video in supervisory mode to review and correct counts made in the first pass. For system verification, the review function of the application was used to count and recount freeway traffic in videos from the Atlanta, Georgia, metropolitan area. Initial counts and reviewed counts were compared, and improvements in count accuracy were assessed. The results indicated the benefit of the review process and suggested that this application could minimize human error and provide more accurate ground truth traffic count data for use in transportation planning applications and for model verification.

Download Full-text

Empirical methodology for crowdsourcing ground truth

Semantic Web ◽

10.3233/sw-200415 ◽

2020 ◽

pp. 1-19

Author(s):

Anca Dumitrache ◽

Oana Inel ◽

Benjamin Timmermans ◽

Carlos Ortiz ◽

Robert-Jan Sips ◽

...

Keyword(s):

Majority Vote ◽

Relation Extraction ◽

Ground Truth ◽

Extraction Methods ◽

Event Extraction ◽

Ground Truth Data ◽

Usual Practice ◽

Event Identification ◽

News Event

The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.

Download Full-text

Collective annotation patterns in learning from crowds

Intelligent Data Analysis ◽

10.3233/ida-200009 ◽

2020 ◽

Vol 24 ◽

pp. 63-86

Author(s):

Francisco Mena ◽

Ricardo Ñanculef ◽

Carlos Valle

Keyword(s):

Machine Learning ◽

Large Scale ◽

Ground Truth ◽

Experimental Results ◽

Ground Truth Data ◽

Satisfactory Performance ◽

Machine Learning Applications ◽

Data Points ◽

Confusion Matrices

The lack of annotated data is one of the major barriers facing machine learning applications today. Learning from crowds, i.e. collecting ground-truth data from multiple inexpensive annotators, has become a common method to cope with this issue. It has been recently shown that modeling the varying quality of the annotations obtained in this way, is fundamental to obtain satisfactory performance in tasks where inexpert annotators may represent the majority but not the most trusted group. Unfortunately, existing techniques represent annotation patterns for each annotator individually, making the models difficult to estimate in large-scale scenarios. In this paper, we present two models to address these problems. Both methods are based on the hypothesis that it is possible to learn collective annotation patterns by introducing confusion matrices that involve groups of data point annotations or annotators. The first approach clusters data points with a common annotation pattern, regardless the annotators from which the labels have been obtained. Implicitly, this method attributes annotation mistakes to the complexity of the data itself and not to the variable behavior of the annotators. The second approach explicitly maps annotators to latent groups that are collectively parametrized to learn a common annotation pattern. Our experimental results show that, compared with other methods for learning from crowds, both methods have advantages in scenarios with a large number of annotators and a small number of annotations per annotator.

Download Full-text

DeepSatData: Building large scale datasets of satellite images for training machine learning models

10.36227/techrxiv.16558482.v1 ◽

2021 ◽

Author(s):

Michael Tarasiou

Keyword(s):

Machine Learning ◽

Large Scale ◽

Ground Truth ◽

Semantic Segmentation ◽

Point Of View ◽

Learning Models ◽

Ground Truth Data ◽

Machine Learning Models ◽

Sentinel 2

This paper presents DeepSatData a pipeline for automatically generating satellite imagery datasets for training machine learning models. We also discuss design considerations with emphasis on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows the generation of large scale datasets required for training deep neural networks (DNN). We discuss issues faced from the point of view of DNN training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach.

Download Full-text

Mapping Large-Scale Plateau Forest in Sanjiangyuan Using High-Resolution Satellite Imagery and Few-Shot Learning

Remote Sensing ◽

10.3390/rs14020388 ◽

2022 ◽

Vol 14 (2) ◽

pp. 388

Author(s):

Zhihao Wei ◽

Kebin Jia ◽

Xiaowei Jia ◽

Pengyu Liu ◽

Ying Ma ◽

...

Keyword(s):

High Resolution ◽

Satellite Imagery ◽

Large Scale ◽

Ground Truth ◽

Fine Tuning ◽

Forest Monitoring ◽

Learning Method ◽

Ground Truth Data ◽

Sensing Applications ◽

High Resolution Satellite Imagery

Monitoring the extent of plateau forests has drawn much attention from governments given the fact that the plateau forests play a key role in global carbon circulation. Despite the recent advances in the remote-sensing applications of satellite imagery over large regions, accurate mapping of plateau forest remains challenging due to limited ground truth information and high uncertainties in their spatial distribution. In this paper, we aim to generate a better segmentation map for plateau forests using high-resolution satellite imagery with limited ground-truth data. We present the first 2 m spatial resolution large-scale plateau forest dataset of Sanjiangyuan National Nature Reserve, including 38,708 plateau forest imagery samples and 1187 handmade accurate plateau forest ground truth masks. We then propose an few-shot learning method for mapping plateau forests. The proposed method is conducted in two stages, including unsupervised feature extraction by leveraging domain knowledge, and model fine-tuning using limited ground truth data. The proposed few-shot learning method reached an F1-score of 84.23%, and outperformed the state-of-the-art object segmentation methods. The result proves the proposed few-shot learning model could help large-scale plateau forest monitoring. The dataset proposed in this paper will soon be available online for the public.

Download Full-text

Crowdsourcing Image Analysis for Plant Phenomics to Generate Ground Truth Data for Machine Learning

10.1101/265918 ◽

2018 ◽

Author(s):

Naihui Zhou ◽

Zachary D Siegel ◽

Scott Zarecor ◽

Nigel Lee ◽

Darwin A Campbell ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Best Practices ◽

Ground Truth ◽

Training Data ◽

Quality Data ◽

High Quality ◽

Ground Truth Data ◽

Plant Phenomics

AbstractThe accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.Author SummaryFood security is a growing global concern. Farmers, plant breeders, and geneticists are hastening to address the challenges presented to agriculture by climate change, dwindling arable land, and population growth. Scientists in the field of plant phenomics are using satellite and drone images to understand how crops respond to a changing environment and to combine genetics and environmental measures to maximize crop growth efficiency. However, the terabytes of image data require new computational methods to extract useful information. Machine learning algorithms are effective in recognizing select parts of images, butthey require high quality data curated by people to train them, a process that can be laborious and costly. We examined how well crowdsourcing works in providing training data for plant phenomics, specifically, segmenting a corn tassel – the male flower of the corn plant – from the often-cluttered images of a cornfield. We provided images to students, and to Amazon MTurkers, the latter being an on-demand workforce brokered by Amazon.com and paid on a task-by-task basis. We report on best practices in crowdsourcing image labeling for phenomics, and compare the different groups on measures such as fatigue and accuracy over time. We find that crowdsourcing is a good way of generating quality labeled data, rivaling that of experts.

Download Full-text

RIGOROUS STRIP ADJUSTMENT OF AIRBORNE LASERSCANNING DATA BASED ON THE ICP ALGORITHM

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-3-w5-73-2015 ◽

2015 ◽

Vol II-3/W5 ◽

pp. 73-80 ◽

Cited By ~ 6

Author(s):

P. Glira ◽

N. Pfeifer ◽

C. Briese ◽

C. Ressl

Keyword(s):

Measurement Errors ◽

Laser Scanning ◽

Ground Truth ◽

Point Clouds ◽

Redundant Information ◽

Ground Truth Data ◽

Icp Algorithm ◽

Airborne Laser ◽

Fully Automatic

Airborne Laser Scanning (ALS) is an efficient method for the acquisition of dense and accurate point clouds over extended areas. To ensure a gapless coverage of the area, point clouds are collected strip wise with a considerable overlap. The redundant information contained in these overlap areas can be used, together with ground-truth data, to re-calibrate the ALS system and to compensate for systematic measurement errors. This process, usually denoted as <i>strip adjustment</i>, leads to an improved georeferencing of the ALS strips, or in other words, to a higher data quality of the acquired point clouds. We present a fully automatic strip adjustment method that (a) uses the original scanner and trajectory measurements, (b) performs an on-the-job calibration of the entire ALS multisensor system, and (c) corrects the trajectory errors individually for each strip. Like in the Iterative Closest Point (ICP) algorithm, correspondences are established iteratively and directly between points of overlapping ALS strips (avoiding a time-consuming segmentation and/or interpolation of the point clouds). The suitability of the method for large amounts of data is demonstrated on the basis of an ALS block consisting of 103 strips.

Download Full-text

DeepSatData: Building large scale datasets of satellite images for training machine learning models

10.36227/techrxiv.16558482 ◽

2021 ◽

Author(s):

Michael Tarasiou

Keyword(s):

Machine Learning ◽

Large Scale ◽

Ground Truth ◽

Semantic Segmentation ◽

Point Of View ◽

Learning Models ◽

Ground Truth Data ◽

Machine Learning Models ◽

Sentinel 2

This paper presents DeepSatData a pipeline for automatically generating satellite imagery datasets for training machine learning models. We also discuss design considerations with emphasis on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows the generation of large scale datasets required for training deep neural networks (DNN). We discuss issues faced from the point of view of DNN training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach.

Download Full-text