A First Approach to Aerosol Classification Using Space-Borne Measurement Data: Machine Learning-Based Algorithm and Evaluation

Wonei Choi; Hanlim Lee; Jeonghyeon Park

doi:10.3390/rs13040609

A First Approach to Aerosol Classification Using Space-Borne Measurement Data: Machine Learning-Based Algorithm and Evaluation

Remote Sensing ◽

10.3390/rs13040609 ◽

2021 ◽

Vol 13 (4) ◽

pp. 609

Author(s):

Wonei Choi ◽

Hanlim Lee ◽

Jeonghyeon Park

Keyword(s):

Machine Learning ◽

Measurement Data ◽

Wavelength Dependence ◽

Single Scattering ◽

New Method ◽

Spherical Particles ◽

Training Dataset ◽

Aerosol Classification ◽

Input Variables ◽

Aerosol Types

A new method was developed for classifying aerosol types involving a machine-learning approach to the use of satellite data. An Aerosol Robotic NETwork (AERONET)-based aerosol-type dataset was used as a target variable in a random forest (RF) model. The contributions of satellite input variables to the RF-based model were quantified to determine an optimal set of input variables. The new method, based on inputs of satellite variables, allows the classification of seven aerosol types: pure dust, dust-dominant mixed, pollution-dominant mixed aerosols, and pollution aerosols (strongly, moderately, weakly, and non-absorbing). The performance of the model was statistically evaluated using AERONET data excluded from the model training dataset. Model accuracy for classifying the seven aerosol types was 59%, improving to 72% for four types (pure dust, dust-dominant mixed, strongly absorbing, and non-absorbing). The performance of the model was evaluated against an earlier aerosol classification method based on the wavelength dependence of single-scattering albedo (SSA) and fine-mode-fraction values from AERONET. Typical wavelength dependences of SSA for individual aerosol types are consistent with those obtained for aerosol types by the new method. This study demonstrates that an RF-based model is capable of satellite aerosol classification with sensitivity to the contribution of non-spherical particles.

Download Full-text

Improving Spatial Coverage of Satellite Aerosol Classification Using a Random Forest Model

Remote Sensing ◽

10.3390/rs13071268 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1268

Author(s):

Wonei Choi ◽

Hanlim Lee ◽

Daewon Kim ◽

Serin Kim

Keyword(s):

Random Forest ◽

Wavelength Dependence ◽

Single Scattering ◽

Global Scale ◽

Satellite Measurement ◽

Spatial Coverage ◽

Aerosol Classification ◽

Input Variables ◽

Aerosol Types ◽

Aerosol Type

The spatial coverage of satellite aerosol classification was improved using a random forest (RF) model trained with observational data including target (aerosol type) and input (satellite measurement) variables. The AErosol RObotic NETwork (AERONET) aerosol-type dataset was used for the target variables. Satellite input variables with many missing data or low mean-decrease accuracy were excluded from the final input variable set, and good performance in aerosol-type classification was achieved. The performance of the RF-based model was evaluated on the basis of the wavelength dependence of single-scattering albedo (SSA) and fine-mode-fraction values from AERONET. Typical SSA wavelength dependence for individual aerosol types was consistent with that obtained for aerosol types by the RF-based model. The spatial coverage of the RF-based model was also compared with that of previously developed models in a global-scale case study. The study demonstrates that the RF-based model allows satellite aerosol classification with improved spatial coverage, with a performance similar to that of previously developed models.

Download Full-text

Automated Aerosol Classification from Spectral UV Measurements Using Machine Learning Clustering

Remote Sensing ◽

10.3390/rs12060965 ◽

2020 ◽

Vol 12 (6) ◽

pp. 965 ◽

Cited By ~ 4

Author(s):

Nikolaos Siomos ◽

Ilias Fountoulakis ◽

Athanasios Natsis ◽

Theano Drosoglou ◽

Alkiviadis Bais

Keyword(s):

Machine Learning ◽

Aerosol Optical Depth ◽

Optical Depth ◽

Mahalanobis Distance ◽

Single Scattering ◽

Training Dataset ◽

Classification Technique ◽

Brewer Spectrophotometer ◽

Aerosol Classification ◽

Double Monochromator

In this study, we present an aerosol classification technique based on measurements of a double monochromator Brewer spectrophotometer during the period 1998–2017 in Thessaloniki, Greece. A machine learning clustering procedure was applied based on the Mahalanobis distance metric. The classification process utilizes the UV Single Scattering Albedo (SSA) at 340 nm and the Extinction Angstrom Exponent (EAE) at 320–360 nm that are obtained from the spectrophotometer. The analysis is supported by measurements from a CIMEL sunphotometer that were deployed in order to establish the training dataset of Brewer measurements. By applying the Mahalanobis distance algorithm to the Brewer timeseries, we automatically assigned measurements in one of the following clusters: Fine Non Absorbing Mixtures (FNA): 64.7%, Black Carbon Mixtures (BC): 17.4%, Dust Mixtures (DUST): 8.1%, and Mixed: 9.8%. We examined the clustering potential of the algorithm by reclassifying the training dataset and comparing it with the original one and also by using manually classified cases. The typing score of the Mahalanobis algorithm is high for all predominant clusters FNA: 77.0%, BC: 63.9%, and DUST: 80.3% when compared with the training dataset. We obtained high scores as well FNA: 100.0%, BC: 66.7%, and DUST: 83.3% when comparing it with the manually classified dataset. The flags obtained here were applied in the timeseries of the Aerosol Optical Depth (AOD) at 340 nm of the Brewer and the CIMEL in order to compare between the two and also stress the future impact of the proposed clustering technique in climatological studies of the station.

Download Full-text

Identifying Aerosol Subtypes from CALIPSO Lidar Profiles Using Deep Machine Learning

Atmosphere ◽

10.3390/atmos12010010 ◽

2020 ◽

Vol 12 (1) ◽

pp. 10

Author(s):

Shan Zeng ◽

Ali Omar ◽

Mark Vaughan ◽

Macarena Ortiz ◽

Charles Trepte ◽

...

Keyword(s):

Machine Learning ◽

Semantic Segmentation ◽

Retrieval Algorithm ◽

Orthogonal Polarization ◽

Ice Clouds ◽

Microphysical Properties ◽

Texture Information ◽

Vertical Distributions ◽

Aerosol Classification ◽

Aerosol Types

The Cloud–Aerosol Lidar with Orthogonal Polarization (CALIOP), on-board the Cloud–Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) platform, is an elastic backscatter lidar that has been providing vertical profiles of the spatial, optical, and microphysical properties of clouds and aerosols since June 2006. Distinguishing between feature types (i.e., clouds vs. aerosol) and subtypes (e.g., ice clouds vs. water clouds and dust aerosols from smoke) in the CALIOP measurements is currently accomplished using layer-integrated measurements acquired by co-polarized (parallel) and cross-polarized (perpendicular) 532 nm channels and a single 1064 nm channel. Newly developed deep machine learning (DML) semantic segmentation methods now have the ability to combine observations from multiple channels with texture information to recognize patterns in data. Instead of focusing on a limited set of layer integrated values, our new DML feature classification technique uses the full scope of range-resolved information available in the CALIOP attenuated backscatter profiles. In this paper, one of the convolutional neural networks (CNN), SegNet, a fast and efficient DML model, is used to distinguish aerosol subtypes directly from the CALIOP profiles. The DML method is a 2D range bin-to-range bin aerosol subtype classification algorithm. We compare our new DML results to the classifications generated by CALIOP’s 1D layer-to-layer operational retrieval algorithm. These two methods, which take distinctly different approaches to aerosol classification, agree in over 60% of the comparisons. Higher levels of agreement are found in homogeneous scenes containing only a single aerosol type (i.e., marine, stratospheric aerosols). Disagreement between the two techniques increases in regions containing mixture of different aerosol types. The multi-dimensional texture information leveraged by the DML method shows advantages in differentiating between aerosol types based on their classification scores, as well as in distinguishing vertical distributions of aerosol types within individual layers. However, untangling mixtures of aerosol subtypes is still challenging for both the DML and operational algorithms.

Download Full-text

A Short Note on the Potential of Utilization of Spectral AERONET-Derived Depolarization Ratios for Aerosol Classification

Atmosphere ◽

10.3390/atmos10030143 ◽

2019 ◽

Vol 10 (3) ◽

pp. 143 ◽

Cited By ~ 1

Author(s):

Il-Sung Zo ◽

Sung-Kyun Shin

Keyword(s):

Short Note ◽

Single Scattering ◽

Observation Data ◽

Spectral Variation ◽

Aerosol Classification ◽

Fine Mode ◽

Remote Sensing Techniques ◽

Aerosol Types ◽

Aerosol Type ◽

Depolarization Ratios

We herein present the spectral linear particle depolarization ratios (δp) from an Aerosol Robotics NETwork (AERONET) sun/sky radiometer with respect to the aerosol type. AERONET observation sites, which are representative of each aerosol type, were selected for our study. The observation data were filtered using the Ångström exponent (Å), fine-mode fraction (FMF) and single scattering albedo (ω) to ensure that the obtained values of δp were representative of each aerosol condition. We report the spectral δp values provided in the recently released AERONET version 3 inversion product for observation of the following aerosol types: dust, polluted dust, smoke, non-absorbing, moderately-absorbing and high-absorbing pollution. The AERONET-derived δp values were generally within the range of the δp values measured from lidar observations for each aerosol type. In addition, it was found that the spectral variation of δp differed according to the aerosol type. From the obtained results, we concluded that our findings provide potential insight into the identification and classification of aerosol types using remote sensing techniques.

Download Full-text

Aerosol classification using sun photometer and lidar data within a machine learning algorithm-a possible nowcasting application

10.5194/egusphere-egu21-11379 ◽

2021 ◽

Author(s):

Joelle Buxmann ◽

Martin Osborne ◽

Mike Protts ◽

Debbie O'Sullivan

Keyword(s):

Machine Learning ◽

Biomass Burning ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Training Set ◽

Angstrom Exponent ◽

Ångström Exponent ◽

Aerosol Classification ◽

Aerosol Types ◽

Ångstrom Exponent

<p>The Met Office operates a ground based operational network of nine polarisation Raman lidars (aerosol profiling instruments) and sun photometers (column integrated information). An aerosol classification scheme using supervised machine learning has been developed. The concept of Mahalanobis (~normalized) distance to identify the aerosol type&#160; from individual Aerosol Robotic Network (AERONET) measurements including Extinction Angstrom Exponent, Absorption Angstrom Exponent, Single Scattering Albedo and Index of refraction is used for a subset of AERONET stations around the globe of known main aerosol types (training set). The aerosol types &#160;include maritime, urban industrial, biomass burning and dust. We build a predictive model from this training set using K nearest neighbour machine learning algorithms. The relation of particle polarisation ratio and lidar ratio from the Raman lidar is used as a sanity check.&#160; We apply the model to 3- 4 years of AERONET and profiling data across the UK, with instruments evenly distributed across the country, from Camborne in Cornwall to Lerwick in the Shetland Islands. We are showing more detailed data of a dust event in May 2016, dust/biomass burning aerosol mix from October 2017 (hurricane Ophelia) and more recent aerosol transported from the Canadian wild fires in September 2020. AERONET Level 2.0 &#160;data is compared to level 1.5 in order to determine the implications for the aerosol classification. Level 1.5 data are cloud-screened, but not quality assured and may not have the final calibration applied. Level 2.0 &#160;data have pre- and post-field calibration applied, are cloud-screened, and quality-assured data. As level 2.0 data is usually only available after 1-2 years (after a new calibration has been performed), it is important to understand the&#160; usefulness of more readily available level 1.5 (cloud screened) data.</p><p>The aim is to build a real time aerosol classification application that can be used in Nowcasting.</p>

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Derivation of Respiratory Metrics in Health and Asthma; Machine Learning Methodology (Preprint)

10.2196/preprints.25178 ◽

2020 ◽

Author(s):

Joseph Prinable ◽

Peter Jones ◽

David Boland ◽

Alistair McEwan ◽

Cindy Thamrin

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Reference Signal ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Local Health ◽

Strongly Correlated ◽

Human Research Ethics ◽

Expiration Time ◽

Wide Variability

BACKGROUND The ability to continuously monitor breathing metrics may have indications for general health as well as respiratory conditions such as asthma. However, few studies have focused on breathing due to a lack of available wearable technologies. OBJECTIVE Examine the performance of two machine learning algorithms in extracting breathing metrics from a finger-based pulse oximeter, which is amenable to long-term monitoring. METHODS Pulse oximetry data was collected from 11 healthy and 11 asthma subjects who breathed at a range of controlled respiratory rates. UNET and Long Short-Term memory (LSTM) algorithms were applied to the data, and results compared against breathing metrics derived from respiratory inductance plethysmography measured simultaneously as a reference. RESULTS The UNET vs LSTM model provided breathing metrics which were strongly correlated with those from the reference signal (all p<0.001, except for inspiratory:expiratory ratio). The following relative mean bias(95% confidence interval) were observed: inspiration time 1.89(-52.95, 56.74)% vs 1.30(-52.15, 54.74)%, expiration time -3.70(-55.21, 47.80)% vs -4.97(-56.84, 46.89)%, inspiratory:expiratory ratio -4.65(-87.18, 77.88)% vs -5.30(-87.07, 76.47)%, inter-breath intervals -2.39(-32.76, 27.97)% vs -3.16(-33.69, 27.36)%, and respiratory rate 2.99(-27.04 to 33.02)% vs 3.69(-27.17 to 34.56)%. CONCLUSIONS Both machine learning models show strongly correlation and good comparability with reference, with low bias though wide variability for deriving breathing metrics in asthma and health cohorts. Future efforts should focus on improvement of performance of these models, e.g. by increasing the size of the training dataset at the lower breathing rates. CLINICALTRIAL Sydney Local Health District Human Research Ethics Committee (#LNR\16\HAWKE99 ethics approval).

Download Full-text

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.25920 ◽

2020 ◽

Author(s):

Mikołaj Morzy ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Adam Wierzbicki

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

Identification of key aerosol types and mixing states in the central Indian Himalayas during the GVAX campaign: the role of particle size in aerosol classification

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.143188 ◽

2020 ◽

pp. 143188

Author(s):

U.C. Dumka ◽

D.G. Kaskaoutis ◽

N. Mihalopoulos ◽

Rahul Sheoran

Keyword(s):

Particle Size ◽

Aerosol Classification ◽

Mixing States ◽

Aerosol Types ◽

Indian Himalayas

Download Full-text

Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine

Remote Sensing ◽

10.3390/rs13010010 ◽

2020 ◽

Vol 13 (1) ◽

pp. 10

Author(s):

Andrea Sulova ◽

Jamal Jokar Arsanjani

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

Google Earth ◽

Summer Season ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Training Dataset ◽

Google Earth Engine

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.

Download Full-text