Quantification of the covariation of lake microbiomes and environmental variables using a machine learning‐based framework

Theodor Sperlea; Nico Kreuder; Daniela Beisser; Georges Hattab; Jens Boenigk; Dominik Heider

doi:10.1111/mec.15872

Functional prediction of environmental variables using metabolic networks

Scientific Reports ◽

10.1038/s41598-021-91486-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Adèle Weber Zendrera ◽

Nataliya Sokolovska ◽

Hédi A. Soula

Keyword(s):

Machine Learning ◽

Growth Temperature ◽

Environmental Variables ◽

Metabolic Networks ◽

Machine Learning Techniques ◽

Underlying Structure ◽

Glutathione Biosynthesis ◽

Additional Information ◽

Cold Environments ◽

Novel Approach

AbstractIn this manuscript, we propose a novel approach to assess relationships between environment and metabolic networks. We used a comprehensive dataset of more than 5000 prokaryotic species from which we derived the metabolic networks. We compute the scope from the reconstructed graphs, which is the set of all metabolites and reactions that can potentially be synthesized when provided with external metabolites. We show using machine learning techniques that the scope is an excellent predictor of taxonomic and environmental variables, namely growth temperature, oxygen tolerance, and habitat. In the literature, metabolites and pathways are rarely used to discriminate species. We make use of the scope underlying structure—metabolites and pathways—to construct the predictive models, giving additional information on the important metabolic pathways needed to discriminate the species, which is often absent in other metabolic network properties. For example, in the particular case of growth temperature, glutathione biosynthesis pathways are specific to species growing in cold environments, whereas tungsten metabolism is specific to species in warm environments, as was hinted in current literature. From a machine learning perspective, the scope is able to reduce the dimension of our data, and can thus be considered as an interpretable graph embedding.

Download Full-text

Lacking Demographic, Socioeconomic, and Environmental Variables in Training Machine Learning Algorithms Makes Generalizability Flawed in Asthma Studies

Journal of Allergy and Clinical Immunology ◽

10.1016/j.jaci.2020.12.432 ◽

2021 ◽

Vol 147 (2) ◽

pp. AB118

Author(s):

Emily Chen ◽

Timothy Darby ◽

Sunit Jariwala

Keyword(s):

Machine Learning ◽

Environmental Variables ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Data-driven and interpretable machine-learning modeling to explore the fine-scale environmental determinants of malaria vectors biting rates in rural Burkina Faso

10.1101/2021.04.13.439583 ◽

2021 ◽

Author(s):

Paul Taconet ◽

Angélique Porciani ◽

Dieudonné Diloma Soma ◽

Karine Mouline ◽

Frédéric Simard ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Burkina Faso ◽

Environmental Variables ◽

Data Driven ◽

Malaria Vectors ◽

Environmental Determinants ◽

Breeding Sites ◽

Landscape Variables ◽

Interpretable Machine Learning

AbstractBackgroundImproving the knowledge and understanding of the environmental determinants of malaria vectors abundances at fine spatiotemporal scales is essential to design locally tailored vector control intervention. This work aimed at exploring the environmental tenets of human-biting activity in the main malaria vectors (Anopheles gambiae s.s., Anopheles coluzzii and Anopheles funestus) in the health district of Diébougou, rural Burkina Faso.MethodsAnopheles human-biting activity was monitored in 27 villages during 15 months (in 2017-2018), and environmental variables (meteorological and landscape) were extracted from high resolution satellite imagery. A two-step data-driven modeling study was then carried-out. Correlation coefficients between the biting rates of each vector species and the environmental variables taken at various temporal lags and spatial distances from the biting events were first calculated. Then, multivariate machine-learning models were generated and interpreted to i) pinpoint primary and secondary environmental drivers of variation in the biting rates of each species and ii) identify complex associations between the environmental conditions and the biting rates.ResultsMeteorological and landscape variables were often significantly correlated with the vectors’ biting rates. Many nonlinear associations and thresholds were unveiled by the multivariate models, both for meteorological and landscape variables. From these results, several aspects of the bio-ecology of the main malaria vectors were precised or hypothesized for the Diébougou area, including breeding sites typologies, development and survival rates in relation to weather, flight ranges from breeding sites, dispersal related to landscape openness.ConclusionsUsing high resolution data in an interpretable machine-learning modeling framework proved to be an efficient way to enhance the knowledge of the complex links between the environment and the malaria vectors at a local scale. More broadly, the emerging field of interpretable machine-learning has significant potential to help improving our understanding of the complex processes leading to malaria transmission.

Download Full-text

Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest

Environmental Research Letters ◽

10.1088/1748-9326/ab7df9 ◽

2020 ◽

Vol 15 (6) ◽

pp. 064005

Author(s):

Yanghui Kang ◽

Mutlu Ozdogan ◽

Xiaojin Zhu ◽

Zhiwei Ye ◽

Christopher Hain ◽

...

Keyword(s):

Machine Learning ◽

Environmental Variables ◽

Learning Algorithms ◽

Maize Yield ◽

Comparative Assessment ◽

Machine Learning Algorithms ◽

Yield Prediction ◽

The Us ◽

Us Midwest

Download Full-text

Spatial estimation of chronic respiratory diseases based on machine learning procedures—an approach using remote sensing data and environmental variables in quito, Ecuador

Applied Geography ◽

10.1016/j.apgeog.2020.102273 ◽

2020 ◽

Vol 123 ◽

pp. 102273

Author(s):

Cesar I. Alvarez-Mendoza ◽

Ana Teodoro ◽

Alberto Freitas ◽

Joao Fonseca

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Environmental Variables ◽

Respiratory Diseases ◽

Remote Sensing Data ◽

Chronic Respiratory Diseases ◽

Sensing Data ◽

Spatial Estimation

Download Full-text

A machine learning approach to achieve accurate time series forecast of sea-wave conditions

10.5194/egusphere-egu2020-22666 ◽

2020 ◽

Author(s):

Giulia Cremonini ◽

Giovanni Besio ◽

Daniele Lagomarsino ◽

Agnese Seminara

Keyword(s):

Machine Learning ◽

Environmental Variables ◽

Machine Learning Algorithms ◽

Accurate Estimation ◽

Time Series Forecast ◽

Machine Learning Approach ◽

Physically Based ◽

Physically Based Models ◽

Wave Conditions ◽

Sea Wave

Reliable forecast of environmental variables is fundamental in managing risk associated with hazard scenarios. In this work, we use state of the art machine learning algorithms to build forecasting models and to get accurate estimation of sea wave conditions. We exploit multivariate time series of environmental variables, extracted either from hindcast database (provided by MeteOcean Group at DICCA) or observed data from sparse buoys. In this way, future values of sea wave height can be predicted in order to evaluate the risk associated with incoming scenarios. The aim is to provide new forecasting tools representing an alternative to physically based models which have higher computational cost.

Download Full-text

Prediction of Cloud Fractional Cover Using Machine Learning

Big Data and Cognitive Computing ◽

10.3390/bdcc5040062 ◽

2021 ◽

Vol 5 (4) ◽

pp. 62

Author(s):

Hanna Svennevik ◽

Michael A. Riegler ◽

Steven Hicks ◽

Trude Storelvmo ◽

Hugo L. Hammer

Keyword(s):

Machine Learning ◽

Environmental Variables ◽

Short Term Memory ◽

Model Development ◽

Regression Equation ◽

Multiple Regression Equation ◽

Human Society ◽

Mountain Areas ◽

Fractional Cover ◽

Life On Earth

Climate change is stated as one of the largest issues of our time, resulting in many unwanted effects on life on earth. Cloud fractional cover (CFC), the portion of the sky covered by clouds, might affect global warming and different other aspects of human society such as agriculture and solar energy production. It is therefore important to improve the projection of future CFC, which is usually projected using numerical climate methods. In this paper, we explore the potential of using machine learning as part of a statistical downscaling framework to project future CFC. We are not aware of any other research that has explored this. We evaluated the potential of two different methods, a convolutional long short-term memory model (ConvLSTM) and a multiple regression equation, to predict CFC from other environmental variables. The predictions were associated with much uncertainty indicating that there might not be much information in the environmental variables used in the study to predict CFC. Overall the regression equation performed the best, but the ConvLSTM was the better performing model along some coastal and mountain areas. All aspects of the research analyses are explained including data preparation, model development, ML training, performance evaluation and visualization.

Download Full-text

Predicting Rice Heading Date Using an Integrated Approach Combining a Machine Learning Method and a Crop Growth Model

Frontiers in Genetics ◽

10.3389/fgene.2020.599510 ◽

2020 ◽

Vol 11 ◽

Author(s):

Tai-Shen Chen ◽

Toru Aoike ◽

Masanori Yamasaki ◽

Hiromi Kajiya-Kanegae ◽

Hiroyoshi Iwata

Keyword(s):

Machine Learning ◽

Environmental Variables ◽

Heading Date ◽

Integrated Approach ◽

Learning Model ◽

Crop Growth ◽

Day Length ◽

Machine Learning Method ◽

Learning Method ◽

Machine Learning Model

Accurate prediction of heading date under various environmental conditions is expected to facilitate the decision-making process in cultivation management and the breeding process of new cultivars adaptable to the environment. Days to heading (DTH) is a complex trait known to be controlled by multiple genes and genotype-by-environment interactions. Crop growth models (CGMs) have been widely used to predict the phenological development of a plant in an environment; however, they usually require substantial experimental data to calibrate the parameters of the model. The parameters are mostly genotype-specific and are thus usually estimated separately for each cultivar. We propose an integrated approach that links genotype marker data with the developmental genotype-specific parameters of CGMs with a machine learning model, and allows heading date prediction of a new genotype in a new environment. To estimate the parameters, we implemented a Bayesian approach with the advanced Markov chain Monte-Carlo algorithm called the differential evolution adaptive metropolis and conducted the estimation using a large amount of data on heading date and environmental variables. The data comprised sowing and heading dates of 112 cultivars/lines tested at 7 locations for 14 years and the corresponding environmental variables (day length and daily temperature). We compared the predictive accuracy of DTH between the proposed approach, a CGM, and a single machine learning model. The results showed that the extreme learning machine (one of the implemented machine learning models) was superior to the CGM for the prediction of a tested genotype in a tested location. The proposed approach outperformed the machine learning method in the prediction of an untested genotype in an untested location. We also evaluated the potential of the proposed approach in the prediction of the distribution of DTH in 103 F2 segregation populations derived from crosses between a common parent, Koshihikari, and 103 cultivars/lines. The results showed a high correlation coefficient (ca. 0.8) of the 10, 50, and 90th percentiles of the observed and predicted distribution of DTH. In this study, the integration of a machine learning model and a CGM was better able to predict the heading date of a new rice cultivar in an untested potential environment.

Download Full-text

Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods

Remote Sensing ◽

10.3390/rs12213609 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3609

Author(s):

Xinchuan Li ◽

Juhua Luo ◽

Xiuliang Jin ◽

Qiaoning He ◽

Yun Niu

Keyword(s):

Machine Learning ◽

Soil Properties ◽

Environmental Variables ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Accurate Estimation ◽

Support Vector ◽

Topographic Wetness Index ◽

Soil Thickness ◽

Extreme Gradient Boosting

Spatially continuous soil thickness data at large scales are usually not readily available and are often difficult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifying the controlling environmental variables of soil thickness and selecting suitable machine learning algorithms are vitally important in modeling. In this study, 11 quantitative and four qualitative environmental variables were selected to explore the main variables that affect soil thickness. Four commonly used machine learning algorithms (multiple linear regression (MLR), support vector regression (SVR), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as individual models to separately predict and obtain a soil thickness distribution map in Henan Province, China. In addition, the two stacking ensemble models using least absolute shrinkage and selection operator (LASSO) and generalized boosted regression model (GBM) were tested and applied to build the most reliable and accurate estimation model. The results showed that variable selection was a very important part of soil thickness modeling. Topographic wetness index (TWI), slope, elevation, land use and enhanced vegetation index (EVI) were the most influential environmental variables in soil thickness modeling. Comparative results showed that the XGBoost model outperformed the MLR, RF and SVR models. Importantly, the two stacking models achieved higher performance than the single model, especially when using GBM. In terms of accuracy, the proposed stacking method explained 64.0% of the variation for soil thickness. The results of our study provide useful alternative approaches for mapping soil thickness, with potential for use with other soil properties.

Download Full-text

Characterizing Groundwater Potential Using GIS-Based Machine Learning Model in Chihe River Basin, China

10.21203/rs.3.rs-1044219/v1 ◽

2021 ◽

Author(s):

Dejian Wang ◽

Jiazhong Qian ◽

Lei Ma ◽

Weidong Zhao ◽

Di Gao ◽

...

Keyword(s):

Machine Learning ◽

Water Resources ◽

River Basin ◽

Environmental Variables ◽

Groundwater Potential ◽

High Potential ◽

Slope Aspect ◽

Learning Models ◽

Topographic Wetness Index ◽

Machine Learning Models

Abstract Mapping of groundwater potential over space, built by synergizing environmental variables and machine learning models, was of great significance for regional water resources management. Taking the Chihe River basin in Anhui province as an example, thirteen influence factors were used to predict the spatial distribution of groundwater, including elevation, slope, aspect, plan curvature, profile curvature, topographic wetness index (TWI), drainage density, distance to rivers, distance to faults, lithology, soil type, land use, and normalized difference vegetation index (NDVI). The potential of groundwater resource in this region was predicted using GIS-based machine learning models, including logistic regression (LR), deep neural networks (DNN), and random forest (RF) model. Then, the accuracy of prediction results was evaluated by calculating the RMSE, MAE and R evaluation index. The results show that there is no collinearity among the 13 environmental impact factors, which can provide corresponding environmental variables for the evaluation of regional groundwater potential. Machine learning models show that groundwater potential is concentrated in moderate to high potential areas. Among them, the moderate to the high potential of this area accounted for 81.14% in the LR model, 90.36% and 87.55% in the DNN model and the RF model, respectively. According to the result of these evaluation indexes, the three models all have high prediction accuracy, among which the LR model performs more prominently. The good prediction capabilities of these machine learning technologies can provide a reliable scientific basis for spatial prediction of groundwater potential and management of water resources.

Download Full-text