Life beneath the ice: jellyfish and ctenophores from the Ross Sea, Antarctica, with an image-based training set for machine learning

Biodiversity Data Journal ◽

10.3897/bdj.9.e69374 ◽

2021 ◽

Vol 9 ◽

Author(s):

Gerlien Verhaegen ◽

Emiliano Cimoli ◽

Dhugal Lindsay

Keyword(s):

Machine Learning ◽

Southern Ocean ◽

Environmental Changes ◽

Ross Sea ◽

Survey Methods ◽

Gelatinous Zooplankton ◽

Video Annotation ◽

Zooplankton Species ◽

Training Set

Southern Ocean ecosystems are currently experiencing increased environmental changes and anthropogenic pressures, urging scientists to report on their biodiversity and biogeography. Two major taxonomically diverse and trophically important gelatinous zooplankton groups that have, however, stayed largely understudied until now are the cnidarian jellyfish and ctenophores. This data scarcity is predominantly due to many of these fragile, soft-bodied organisms being easily fragmented and/or destroyed with traditional net sampling methods. Progress in alternative survey methods including, for instance, optics-based methods is slowly starting to overcome these obstacles. As video annotation by human observers is both time-consuming and financially costly, machine-learning techniques should be developed for the analysis of in situ /in aqua image-based datasets. This requires taxonomically accurate training sets for correct species identification and the present paper is the first to provide such data. In this study, we twice conducted three week-long in situ optics-based surveys of jellyfish and ctenophores found under the ice in the McMurdo Sound, Antarctica. Our study constitutes the first optics-based survey of gelatinous zooplankton in the Ross Sea and the first study to use in situ / in aqua observations to describe taxonomic and some trophic and behavioural characteristics of gelatinous zooplankton from the Southern Ocean. Despite the small geographic and temporal scales of our study, we provided new undescribed morphological traits for all observed gelatinous zooplankton species (eight cnidarian and four ctenophore species). Three ctenophores and one leptomedusa likely represent undescribed species. Furthermore, along with the photography and videography, we prepared a Common Objects in Context (COCO) dataset, so that this study is the first to provide a taxonomist-ratified image training set for future machine-learning algorithm development concerning Southern Ocean gelatinous zooplankton species.

Download Full-text

Life beneath the ice: jellyfish and ctenophores from the Ross Sea, Antarctica, with an image-based training set for machine learning (project)

MorphoBank datasets ◽

10.7934/p3993 ◽

2021 ◽

Author(s):

G Verhaegen ◽

E Cimoli ◽

D Lindsay

Keyword(s):

Machine Learning ◽

Ross Sea ◽

Training Set

Download Full-text

Using Machine Learning for Estimating Rice Chlorophyll Content from In Situ Hyperspectral Data

Remote Sensing ◽

10.3390/rs12183104 ◽

2020 ◽

Vol 12 (18) ◽

pp. 3104

Author(s):

Gangqiang An ◽

Minfeng Xing ◽

Binbin He ◽

Chunhua Liao ◽

Xiaodong Huang ◽

...

Keyword(s):

Machine Learning ◽

Chlorophyll Content ◽

Precision Agriculture ◽

Rate Of Change ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Training Set ◽

Validation Set

Chlorophyll is an essential pigment for photosynthesis in crops, and leaf chlorophyll content can be used as an indicator for crop growth status and help guide nitrogen fertilizer applications. Estimating crop chlorophyll content plays an important role in precision agriculture. In this study, a variable, rate of change in reflectance between wavelengths ‘a’ and ‘b’ (RCRWa-b), derived from in situ hyperspectral remote sensing data combined with four advanced machine learning techniques, Gaussian process regression (GPR), random forest regression (RFR), support vector regression (SVR), and gradient boosting regression tree (GBRT), were used to estimate the chlorophyll content (measured by a portable soil–plant analysis development meter) of rice. The performances of the four machine learning models were assessed and compared using root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The results revealed that four features of RCRWa-b, RCRW551.0–565.6, RCRW739.5–743.5, RCRW684.4–687.1 and RCRW667.9–672.0, were effective in estimating the chlorophyll content of rice, and the RFR model generated the highest prediction accuracy (training set: RMSE = 1.54, MAE =1.23 and R2 = 0.95; validation set: RMSE = 2.64, MAE = 1.99 and R2 = 0.80). The GPR model was found to have the strongest generalization (training set: RMSE = 2.83, MAE = 2.16 and R2 = 0.77; validation set: RMSE = 2.97, MAE = 2.30 and R2 = 0.76). We conclude that RCRWa-b is a useful variable to estimate chlorophyll content of rice, and RFR and GPR are powerful machine learning algorithms for estimating the chlorophyll content of rice.

Download Full-text

Remotely sensed primary production in the western Ross Sea: results of in situ tuned models

Antarctic Science ◽

10.1017/s095410200300107x ◽

2003 ◽

Vol 15 (1) ◽

pp. 77-84 ◽

Cited By ~ 13

Author(s):

R. BARBINI ◽

F. COLAO ◽

R. FANTONI ◽

L. FIORANI ◽

A. PALUCCI ◽

...

Keyword(s):

Southern Ocean ◽

Primary Production ◽

Ross Sea ◽

Wide Field ◽

Production Models ◽

Wide Field Of View ◽

And Performance ◽

The Antarctic ◽

Global Carbon

The Southern Ocean plays an important role in the global carbon cycle and, as a consequence, in the planetary climate equilibrium. The Ross Sea is one of the more productive regions in the Southern Ocean, due to strong phytoplankton blooms occurring during summer. Satellite remote sensing is a powerful tool for investigating such phenomena, especially if the bio-optical algorithms are tuned with in situ data. In this paper, after having compared the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) and the ENEA Lidar Fluorosensor (ELF), the SeaWiFS chlorophyll a (Chl a) algorithm is tuned in the Ross Sea by means of the ELF measurements. The Chl a concentrations obtained in this way have been the basis for estimating productivity values and their evolution during summer 1997–98. Three primary production models have been used, providing information on their accuracy and performance in the Antarctic environment. Our investigations suggest that the primary production was lower than usual during the period 3 December 1997–16 January 1998.

Download Full-text

Shell dissolution observed in <i>Limacina helicina antarctica</i> from the Ross Sea, Antarctica: paired shell characteristics and in situ seawater chemistry

10.5194/bg-2016-467 ◽

2016 ◽

Cited By ~ 2

Author(s):

Kevin M. Johnson ◽

Umihiko Hoshijima ◽

Cailan S. Sugano ◽

Alice T. Nguyen ◽

Gretchen E. Hofmann

Keyword(s):

Southern Ocean ◽

Ocean Acidification ◽

Field Study ◽

Ross Sea ◽

Carbonate Chemistry ◽

Active Dissolution ◽

In Situ Conditions ◽

Single Station ◽

Limacina Helicina

Abstract. The euthecosome (shelled) Antarctic pteropod, Limacina helicina antarctica, is a dominant member of the Southern Ocean macrozooplankton community, and due to its aragonitic shell, is thought to be particularly vulnerable to ocean acidification and under-saturation conditions that are predicted in the future. Notably, pteropods in surface waters and near the continental shelf in the Ross Sea are highly vulnerable as these regions are predicted to be seasonally under-saturated within 2–3 decades. Carbonate chemistry data are rare for this region and here we present the results of a 6-week field study and report patterns of dissolution of juvenile pteropods along with carbonate chemistry of seawater at the time of collection. Conducted in McMurdo Sound in the south Ross Sea in the Pacific sector of the Southern Ocean, L. h. antarctica was successfully collected in plankton tows through the fast sea ice at a single station at 50 m. During the 6-week field study, ocean pH was relatively stable, ranging from 7.988 in October to 8.029 by early December. Calculated saturation states for aragonite (Ωarag) over the 6-week study period ranged from 1.16 to 1.24. Pteropods collected at each sampling time point were prepared for SEM and analysis revealed that roughly 63 % of the shells displayed some degree of shell irregularities suggesting that active dissolution of the aragonitic shell was ongoing under in situ conditions. These results add to the accumulating evidence that shelled pteropods of the Southern Ocean are experiencing aragonite under-saturation events in the present-day that lead to a majority of individuals displaying shell dissolution. Predicted changes to the carbonate system in the Southern Ocean from ocean acidification will likely expand the intensity and duration of these under-saturation events, increasing the need to better understand how pteropods will fare in response to ocean acidification.

Download Full-text

Southern Ocean Cloud and Aerosol data: a compilation of measurements from the 2018 Southern Ocean Ross Sea Marine Ecosystems and Environment voyage

10.5194/essd-2020-321 ◽

2020 ◽

Author(s):

Stefanie Kremser ◽

Mike Harvey ◽

Peter Kuma ◽

Sean Hartery ◽

Alexia Saint-Macary ◽

...

Keyword(s):

Southern Ocean ◽

Climate Models ◽

Dimethyl Sulfide ◽

Prediction Models ◽

Climate Model ◽

Ross Sea ◽

Evaluation Studies ◽

Weather Prediction ◽

Data Sets

Abstract. Due to its remote location and extreme weather conditions, atmospheric in situ measurements are rare in the Southern Ocean. As a result, aerosol-cloud interactions in this region are poorly understood and remain a major source of uncertainty in climate models. This, in turn, contributes substantially to persistent biases in climate model simulations, numerical weather prediction models and reanalyses. It has been shown in previous studies that in situ and ground-based remote sensing measurements across the Southern Ocean are critical for complementing satellite data sets due to the importance of boundary layer and low-level cloud processes. These processes are poorly sampled by satellite-based measurements which are typically obscured by near-continuous overlying cloud cover observed in this region. In this work we present a comprehensive set of ship-based aerosol and meteorological observations collected on the TAN1802 voyage of R/V Tangaroa across the Southern Ocean, from Wellington, New Zealand, to the Ross Sea, Antarctica. The voyage was carried out from 8 February to 21 March, 2018. Many distinct, but contemporaneous, data sets were collected throughout the voyage. The compiled data sets include measurements from a range of instruments, such as (i) meteorological conditions at the sea surface and profile measurements; (ii) the size and concentration of particles; (iii) trace gases dissolved in the ocean surface such as dimethyl sulfide and carbonyl sulfide; (iv) and remotely sensed observations of low clouds. Here, we describe the voyage, the instruments, data processing, and provide a brief overview of some of the data products available. We encourage the scientific community to use these measurements for further analysis and model evaluation studies, in particular, for studies of Southern Ocean clouds, aerosol and their interaction. The data sets presented in this study are publicly available at https://doi.org/10.5281/zenodo.4060237 (Kremser et al. 2020).

Download Full-text

Southern Ocean cloud and aerosol data: a compilation of measurements from the 2018 Southern Ocean Ross Sea Marine Ecosystems and Environment voyage

Earth System Science Data ◽

10.5194/essd-13-3115-2021 ◽

2021 ◽

Vol 13 (7) ◽

pp. 3115-3153

Author(s):

Stefanie Kremser ◽

Mike Harvey ◽

Peter Kuma ◽

Sean Hartery ◽

Alexia Saint-Macary ◽

...

Keyword(s):

Southern Ocean ◽

Dimethyl Sulfide ◽

Prediction Models ◽

Ross Sea ◽

Weather Prediction ◽

Marine Ecosystem ◽

Shortwave Radiation ◽

Data Sets ◽

Aerosol Cloud

Abstract. Due to its remote location and extreme weather conditions, atmospheric in situ measurements are rare in the Southern Ocean. As a result, aerosol–cloud interactions in this region are poorly understood and remain a major source of uncertainty in climate models. This, in turn, contributes substantially to persistent biases in climate model simulations such as the well-known positive shortwave radiation bias at the surface, as well as biases in numerical weather prediction models and reanalyses. It has been shown in previous studies that in situ and ground-based remote sensing measurements across the Southern Ocean are critical for complementing satellite data sets due to the importance of boundary layer and low-level cloud processes. These processes are poorly sampled by satellite-based measurements and are often obscured by multiple overlying cloud layers. Satellite measurements also do not constrain the aerosol–cloud processes very well with imprecise estimation of cloud condensation nuclei. In this work, we present a comprehensive set of ship-based aerosol and meteorological observations collected on the 6-week Southern Ocean Ross Sea Marine Ecosystem and Environment voyage (TAN1802) voyage of RV Tangaroa across the Southern Ocean, from Wellington, New Zealand, to the Ross Sea, Antarctica. The voyage was carried out from 8 February to 21 March 2018. Many distinct, but contemporaneous, data sets were collected throughout the voyage. The compiled data sets include measurements from a range of instruments, such as (i) meteorological conditions at the sea surface and profile measurements; (ii) the size and concentration of particles; (iii) trace gases dissolved in the ocean surface such as dimethyl sulfide and carbonyl sulfide; (iv) and remotely sensed observations of low clouds. Here, we describe the voyage, the instruments, and data processing, and provide a brief overview of some of the data products available. We encourage the scientific community to use these measurements for further analysis and model evaluation studies, in particular, for studies of Southern Ocean clouds, aerosol, and their interaction. The data sets presented in this study are publicly available at https://doi.org/10.5281/zenodo.4060237 (Kremser et al., 2020).

Download Full-text

Global soil moisture data derived through machine learning trained with in-situ measurements

Scientific Data ◽

10.1038/s41597-021-00964-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Sungmin O. ◽

Rene Orth

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Large Scale ◽

Short Term Memory ◽

Temporal Dynamics ◽

Soil Moisture Data ◽

Wide Range ◽

Global Soil

AbstractWhile soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, SoMo.ml. We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. SoMo.ml provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. SoMo.ml performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. SoMo.ml complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.

Download Full-text

Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere

Remote Sensing ◽

10.3390/rs13071250 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1250

Author(s):

Yanxing Hu ◽

Tao Che ◽

Liyun Dai ◽

Lin Xiao

Keyword(s):

Machine Learning ◽

Northern Hemisphere ◽

Snow Depth ◽

Learning Algorithm ◽

Random Forest Regression ◽

Machine Learning Methods ◽

Long Time ◽

In Situ Observations ◽

Input Variables

In this study, a machine learning algorithm was introduced to fuse gridded snow depth datasets. The input variables of the machine learning method included geolocation (latitude and longitude), topographic data (elevation), gridded snow depth datasets and in situ observations. A total of 29,565 in situ observations were used to train and optimize the machine learning algorithm. A total of five gridded snow depth datasets—Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) snow depth, Global Snow Monitoring for Climate Research (GlobSnow) snow depth, Long time series of daily snow depth over the Northern Hemisphere (NHSD) snow depth, ERA-Interim snow depth and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) snow depth—were used as input variables. The first three snow depth datasets are retrieved from passive microwave brightness temperature or assimilation with in situ observations, while the last two are snow depth datasets obtained from meteorological reanalysis data with a land surface model and data assimilation system. Then, three machine learning methods, i.e., Artificial Neural Networks (ANN), Support Vector Regression (SVR), and Random Forest Regression (RFR), were used to produce a fused snow depth dataset from 2002 to 2004. The RFR model performed best and was thus used to produce a new snow depth product from the fusion of the five snow depth datasets and auxiliary data over the Northern Hemisphere from 2002 to 2011. The fused snow-depth product was verified at five well-known snow observation sites. The R2 of Sodankylä, Old Aspen, and Reynolds Mountains East were 0.88, 0.69, and 0.63, respectively. At the Swamp Angel Study Plot and Weissfluhjoch observation sites, which have an average snow depth exceeding 200 cm, the fused snow depth did not perform well. The spatial patterns of the average snow depth were analyzed seasonally, and the average snow depths of autumn, winter, and spring were 5.7, 25.8, and 21.5 cm, respectively. In the future, random forest regression will be used to produce a long time series of a fused snow depth dataset over the Northern Hemisphere or other specific regions.

Download Full-text

Classification of multiwavelength transients with Machine Learning

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3873 ◽

2020 ◽

Author(s):

K Sooknunan ◽

M Lochner ◽

Bruce A Bassett ◽

H V Peiris ◽

R Fender ◽

...

Keyword(s):

Machine Learning ◽

Small Sample ◽

Light Curves ◽

Machine Learning Techniques ◽

Optical Data ◽

Test Time ◽

Test Accuracy ◽

Training Set ◽

The Impact

Abstract With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the eleven classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78%. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97%, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19%.

Download Full-text

Evaluating Machine Learning and Geostatistical Methods for Spatial Gap-filling of Monthly ESA CCI Soil Moisture in China

Remote Sensing ◽

10.3390/rs13142848 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2848

Author(s):

Hao Sun ◽

Qian Xu

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Water Resource Management ◽

Large Scale ◽

Gap Filling ◽

Spatial Continuity ◽

Study Results ◽

Data Gaps

Obtaining large-scale, long-term, and spatial continuous soil moisture (SM) data is crucial for climate change, hydrology, and water resource management, etc. ESA CCI SM is such a large-scale and long-term SM (longer than 40 years until now). However, there exist data gaps, especially for the area of China, due to the limitations in remote sensing of SM such as complex topography, human-induced radio frequency interference (RFI), and vegetation disturbances, etc. The data gaps make the CCI SM data cannot achieve spatial continuity, which entails the study of gap-filling methods. In order to develop suitable methods to fill the gaps of CCI SM in the whole area of China, we compared typical Machine Learning (ML) methods, including Random Forest method (RF), Feedforward Neural Network method (FNN), and Generalized Linear Model (GLM) with a geostatistical method, i.e., Ordinary Kriging (OK) in this study. More than 30 years of passive–active combined CCI SM from 1982 to 2018 and other biophysical variables such as Normalized Difference Vegetation Index (NDVI), precipitation, air temperature, Digital Elevation Model (DEM), soil type, and in situ SM from International Soil Moisture Network (ISMN) were utilized in this study. Results indicated that: 1) the data gap of CCI SM is frequent in China, which is found not only in cold seasons and areas but also in warm seasons and areas. The ratio of gap pixel numbers to the whole pixel numbers can be greater than 80%, and its average is around 40%. 2) ML methods can fill the gaps of CCI SM all up. Among the ML methods, RF had the best performance in fitting the relationship between CCI SM and biophysical variables. 3) Over simulated gap areas, RF had a comparable performance with OK, and they outperformed the FNN and GLM methods greatly. 4) Over in situ SM networks, RF achieved better performance than the OK method. 5) We also explored various strategies for gap-filling CCI SM. Results demonstrated that the strategy of constructing a monthly model with one RF for simulating monthly average SM and another RF for simulating monthly SM disturbance achieved the best performance. Such strategy combining with the ML method such as the RF is suggested in this study for filling the gaps of CCI SM in China.

Download Full-text