A New Satellite-Based Retrieval of Low-Cloud Liquid-Water Path Using Machine Learning and Meteosat SEVIRI Data

Miae Kim; Jan Cermak; Hendrik Andersen; Julia Fuchs; Roland Stirnberg

doi:10.3390/rs12213475

A New Satellite-Based Retrieval of Low-Cloud Liquid-Water Path Using Machine Learning and Meteosat SEVIRI Data

Remote Sensing ◽

10.3390/rs12213475 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3475

Author(s):

Miae Kim ◽

Jan Cermak ◽

Hendrik Andersen ◽

Julia Fuchs ◽

Roland Stirnberg

Keyword(s):

Machine Learning ◽

Liquid Water ◽

Ground Truth ◽

Learning Model ◽

Statistical Machine Learning ◽

Liquid Water Path ◽

Ground Truth Data ◽

Water Path ◽

Machine Learning Model ◽

Low Cloud

Clouds are one of the major uncertainties of the climate system. The study of cloud processes requires information on cloud physical properties, in particular liquid water path (LWP). This parameter is commonly retrieved from satellite data using look-up table approaches. However, existing LWP retrievals come with uncertainties related to assumptions inherent in physical retrievals. Here, we present a new retrieval technique for cloud LWP based on a statistical machine learning model. The approach utilizes spectral information from geostationary satellite channels of Meteosat Spinning-Enhanced Visible and Infrared Imager (SEVIRI), as well as satellite viewing geometry. As ground truth, data from CloudNet stations were used to train the model. We found that LWP predicted by the machine-learning model agrees substantially better with CloudNet observations than a current physics-based product, the Climate Monitoring Satellite Application Facility (CM SAF) CLoud property dAtAset using SEVIRI, edition 2 (CLAAS-2), highlighting the potential of such approaches for future retrieval developments.

Download Full-text

Front Cover: A Machine Learning Model to Classify Dynamic Processes in Liquid Water (ChemPhysChem 1/2022)

ChemPhysChem ◽

10.1002/cphc.202100868 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Jie Huang ◽

Gang Huang ◽

Shiben Li

Keyword(s):

Machine Learning ◽

Liquid Water ◽

Learning Model ◽

Dynamic Processes ◽

Front Cover ◽

Machine Learning Model

Download Full-text

Glean

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447703 ◽

2021 ◽

Vol 14 (6) ◽

pp. 997-1005

Author(s):

Sandeep Tata ◽

Navneet Potti ◽

James B. Wendt ◽

Lauro Beltrão Costa ◽

Marc Najork ◽

...

Keyword(s):

Machine Learning ◽

Data Management ◽

Real World ◽

Empirical Studies ◽

Ground Truth ◽

Training Data ◽

Ground Truth Data ◽

Document Type ◽

Machine Learning Model ◽

Structured Information

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.

Download Full-text

Spatial Variability of Liquid Water Path in Marine Low Cloud: The Importance of Mesoscale Cellular Convection

Journal of Climate ◽

10.1175/jcli3702.1 ◽

2006 ◽

Vol 19 (9) ◽

pp. 1748-1764 ◽

Cited By ~ 233

Author(s):

Robert Wood ◽

Dennis L. Hartmann

Keyword(s):

Spatial Variability ◽

Liquid Water ◽

Large Scale ◽

Marine Boundary Layer ◽

Liquid Water Path ◽

Water Path ◽

Geographical Regions ◽

Low Cloud ◽

Moderate Resolution Imaging Spectroradiometer ◽

Subtropical Oceans

Abstract Liquid water path (LWP) mesoscale spatial variability in marine low cloud over the eastern subtropical oceans is examined using two months of daytime retrievals from the Moderate Resolution Imaging Spectroradiometer (MODIS) on the NASA Terra satellite. Approximately 20 000 scenes of size 256 km × 256 km are used in the analysis. It is found that cloud fraction is strongly linked with the LWP variability in the cloudy fraction of the scene. It is shown here that in most cases LWP spatial variance is dominated by horizontal scales of 10–50 km, and increases as the variance-containing scale increases, indicating the importance of organized mesoscale cellular convection (MCC). A neural network technique is used to classify MODIS scenes by the spatial variability type (no MCC, closed MCC, open MCC, cellular but disorganized). It is shown how the different types tend to occupy distinct geographical regions and different physical regimes within the subtropics, although the results suggest considerable overlap of the large-scale meteorological conditions associated with each scene type. It is demonstrated that both the frequency of occurrence, and the variance-containing horizontal scale of the MCC increases as the marine boundary layer (MBL) depth increases. However, for the deepest MBLs, the MCC tends to be replaced by clouds containing cells but lacking organization. In regions where MCC is prevalent, a lack of sensitivity of the MCC type (open or closed) to the large-scale meteorology was found, suggesting a mechanism internal to the MBL may be important in determining MCC type. The results indicate that knowledge of the physics of MCC will be required to completely understand and predict low cloud coverage and variability in the subtropics.

Download Full-text

The Impact of Low Clouds on Surface Shortwave Radiation in the ECMWF Model

Monthly Weather Review ◽

10.1175/mwr-d-11-00316.1 ◽

2012 ◽

Vol 140 (11) ◽

pp. 3783-3794 ◽

Cited By ~ 22

Author(s):

Maike Ahlgrimm ◽

Richard Forbes

Keyword(s):

Great Plains ◽

Liquid Water ◽

Shortwave Radiation ◽

Fair Weather ◽

Liquid Water Path ◽

Water Path ◽

Surface Irradiance ◽

Low Clouds ◽

Low Cloud ◽

The Impact

Abstract The long-term measurement records from the Atmospheric Radiation Measurement site on the Southern Great Plains show evidence of a bias in the ECMWF model’s surface irradiance. Based on previous studies, which have suggested that summertime shallow clouds may contribute to the bias, an evaluation of 146 days with observed nonprecipitating fair-weather cumulus clouds is performed. In-cloud liquid water path and effective radius are both overestimated in the model with liquid water path dominating to produce clouds that are too reflective. These are compensated by occasional cloud-free days in the model such that the fair-weather cumulus regime overall does not contribute significantly to the multiyear daytime mean surface irradiance bias of 23 W m−2. To further explore the origin of the bias, observed and modeled cloud fraction profiles over 6 years are classified and sorted based on the surface irradiance bias associated with each sample pair. Overcast low cloud conditions during the spring and fall seasons are identified as a major contributor. For samples with low cloud present in both observations and model, opposing surface irradiance biases are found for overcast and broken cloud cover conditions. A reduction of cloud liquid to a third for broken low clouds and an increase by a factor of 1.5 in overcast situations improves agreement with the observed liquid water path distribution. This approach of combining the model shortwave bias with a cloud classification helps to identify compensating errors in the model, providing guidance for a targeted improvement of cloud parameterizations.

Download Full-text

The Role of Nonconvective Condensation Processes in Response of Surface Shortwave Cloud Radiative Forcing to El Niño Warming

Journal of Climate ◽

10.1175/jcli-d-13-00632.1 ◽

2014 ◽

Vol 27 (17) ◽

pp. 6721-6736 ◽

Cited By ~ 20

Author(s):

Lijuan Li ◽

Bin Wang ◽

Guang J. Zhang

Keyword(s):

Liquid Water ◽

El Niño ◽

Radiative Forcing ◽

El Nino ◽

Atmospheric Stability ◽

Cloud Amount ◽

Liquid Water Path ◽

Cloud Radiative Forcing ◽

Water Path ◽

Low Cloud

Abstract The weak response of surface shortwave cloud radiative forcing (SWCF) to El Niño over the equatorial Pacific remains a common problem in many contemporary climate models. This study shows that two versions of the Grid-Point Atmospheric Model of the Institute of Atmospheric Physics (IAP)/State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics (LASG) (GAMIL) produce distinctly different surface SWCF response to El Niño. The earlier version, GAMIL1, underestimates this response, whereas the latest version, GAMIL2, simulates it well. To understand the causes for the different SWCF responses between the two simulations, the authors analyze the underlying physical mechanisms. Results indicate the enhanced stratiform condensation and evaporation in GAMIL2 play a key role in improving the simulations of multiyear annual mean water vapor (or relative humidity), cloud fraction, and in-cloud liquid water path (ICLWP) and hence in reducing the biases of SWCF and rainfall responses to El Niño due to all of the improved dynamical (vertical velocity at 500 hPa), cloud amount, and liquid water path (LWP) responses. The largest contribution to the SWCF response improvement in GAMIL2 is from LWP in the Niño-4 region and from low-cloud cover and LWP in the Niño-3 region. Furthermore, as a crucial factor in the low-cloud response, the atmospheric stability change in the lower layers is significantly influenced by the nonconvective heating variation during La Niña.

Download Full-text

A machine learning model to classify dynamic processes in liquid water

ChemPhysChem ◽

10.1002/cphc.202100599 ◽

2021 ◽

Author(s):

Jie Huang ◽

Gang Huang ◽

Shiben Li

Keyword(s):

Machine Learning ◽

Liquid Water ◽

Learning Model ◽

Dynamic Processes ◽

Machine Learning Model

Download Full-text

Training and Validating a Machine Learning Model for the Sensor-Based Monitoring of Lying Behavior in Dairy Cows on Pasture and in the Barn

Animals ◽

10.3390/ani11092660 ◽

2021 ◽

Vol 11 (9) ◽

pp. 2660

Author(s):

Lara Schmeling ◽

Golnaz Elmamooz ◽

Phan Thai Hoang ◽

Anastasiia Kozar ◽

Daniela Nicklas ◽

...

Keyword(s):

Machine Learning ◽

Dairy Cows ◽

Ground Truth ◽

Monitoring Systems ◽

Ground Truth Data ◽

Machine Learning Model ◽

Video Observations ◽

Standing Up ◽

Sensitivity Specificity ◽

Lying Down

Monitoring systems assist farmers in monitoring the health of dairy cows by predicting behavioral patterns (e.g., lying) and their changes with machine learning models. However, the available systems were developed either for indoors or for pasture and fail to predict the behavior in other locations. Therefore, the goal of our study was to train and evaluate a model for the prediction of lying on a pasture and in the barn. On three farms, 7–11 dairy cows each were equipped with the prototype of the monitoring system containing an accelerometer, a magnetometer and a gyroscope. Video observations on the pasture and in the barn provided ground truth data. We used 34.5 h of datasets from pasture for training and 480.5 h from both locations for evaluating. In comparison, random forest, an orientation-independent feature set with 5 s windows without overlap, achieved the highest accuracy. Sensitivity, specificity and accuracy were 95.6%, 80.5% and 87.4%, respectively. Accuracy on the pasture (93.2%) exceeded accuracy in the barn (81.4%). Ruminating while standing was the most confused with lying. Out of individual lying bouts, 95.6 and 93.4% were identified on the pasture and in the barn, respectively. Adding a model for standing up events and lying down events could improve the prediction of lying in the barn.

Download Full-text

Intercomparisons of liquid water path based on SEVIRI images and gradient boosting regression trees with in-situ observations and satellite-derived products

10.5194/egusphere-egu2020-18806 ◽

2020 ◽

Author(s):

Miae Kim ◽

Jan Cermak ◽

Hendrik Andersen ◽

Julia Fuchs ◽

Roland Stirnberg

Keyword(s):

Machine Learning ◽

Liquid Water ◽

Climate Models ◽

Regression Trees ◽

Boosted Regression Trees ◽

Gradient Boosting ◽

Liquid Water Path ◽

Water Path ◽

First Results

<div>This contribution presents a technique for the machine-learning-based retrieval of cloud liquid&#160;water path. Cloud effects are among the major uncertainties in climate models for estimating&#160;and predicting the Earth&#8217;s energy budget. The study of cloud processes requires information&#160;on cloud physical properties, such as the liquid water path (LWP), which is commonly&#160;retrieved from satellite sensors using look-up table approaches. However, the accuracy of&#160;LWP varies temporally and spatially, also due to assumptions inherent in any physical&#160;retrieval. The aim of this study is to improve the accuracy of LWP and analyze quantitatively&#160;the accuracy and its errors. To this end, a statistical LWP retrieval was developed using&#160;spectral information from geostationary satellite channels (Meteosat Spinning-Enhanced&#160;Visible and Infrared Imager, SEVIRI), and satellite viewing geometry. The machine-learning&#160;method chosen is gradient-boosted regression trees (GBRTs), which is an ensemble of&#160;decision trees but more effective than traditional tree-based models. This study reports on&#160;first results, as well as a comparison between the GBRT-derived LWP estimates and those&#160;from the SEVIRI-based products of the Climate Monitoring Satellite Application Facility&#160;(CM-SAF, CLAAS-A2), as well as MODIS products. We use case studies for individual&#160;in-situ measurement sites in Europe under varying meteorological conditions to determine&#160;the factors influencing LWP retrieval quality.</div>

Download Full-text

A Machine Learning Model to Classify Dynamic Processes in Liquid Water

ChemPhysChem ◽

10.1002/cphc.202100867 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Jie Huang ◽

Gang Huang ◽

Shiben Li

Keyword(s):

Machine Learning ◽

Liquid Water ◽

Learning Model ◽

Dynamic Processes ◽

Machine Learning Model

Download Full-text

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Multivariate Statistical Machine Learning Methods for Genomic Prediction ◽

10.1007/978-3-030-89010-0_4 ◽

2022 ◽

pp. 109-139

Author(s):

Osval Antonio Montesinos López ◽

Abelardo Montesinos López ◽

Jose Crossa

Keyword(s):

Machine Learning ◽

Predictive Modeling ◽

Learning Model ◽

Prediction Performance ◽

Training Data ◽

Statistical Machine Learning ◽

Machine Learning Model ◽

Data Points ◽

The Difference ◽

Model Tuning

AbstractThe overfitting phenomenon happens when a statistical machine learning model learns very well about the noise as well as the signal that is present in the training data. On the other hand, an underfitted phenomenon occurs when only a few predictors are included in the statistical machine learning model that represents the complete structure of the data pattern poorly. This problem also arises when the training data set is too small and thus an underfitted model does a poor job of fitting the training data and unsatisfactorily predicts new data points. This chapter describes the importance of the trade-off between prediction accuracy and model interpretability, as well as the difference between explanatory and predictive modeling: Explanatory modeling minimizes bias, whereas predictive modeling seeks to minimize the combination of bias and estimation variance. We assess the importance and different methods of cross-validation as well as the importance and strategies of tuning that are key to the successful use of some statistical machine learning methods. We explain the most important metrics for evaluating the prediction performance for continuous, binary, categorical, and count response variables.

Download Full-text