Quantitative reconstruction of past salinity variations in African lakes: assessment of chironomid-based inference models (Insecta: Diptera) in space and time

2004 ◽  
Vol 61 (6) ◽  
pp. 986-998 ◽  
Author(s):  
Dirk Verschuren ◽  
Brian F Cumming ◽  
Kathleen R Laird

Faunal records of 20 common midge species (Diptera: Chironomidae) in 32 African surface waters with salinities ranging from 20 to 41 000 µS·cm–1 were used to develop inference models for quantitative reconstruction of past salinity variations from larval chironomid fossils preserved in lake sediments. Weighted-averaging regression and calibration models using presence–absence data (P/A) and presence–absence data with tolerance down-weighting (P/Atol) produced bootstrapped coefficients of determination (r2) of 0.78 and 0.81, respectively, and root mean squared errors (RMSE) of prediction of 0.42 and 0.39 log conductivity units. Historical conductivity data from African lakes are scarce. Therefore, model performance was tested in time by comparing chironomid-inferred conductivity estimates with the corresponding diatom-inferred estimates in sediment records of two fluctuating lakes in the Rift Valley of Kenya. A hybrid procedure in which presence–absence calibration models were applied to abundance-weighted fossil data yielded significantly higher correlation between chironomid- and diatom-inferred time series (Lake Oloidien AD 1880–1991, r2 = 0.76–0.78; Crescent Island Crater AD 900–1993, r2 = 0.56–0.61) than by applying the same models to presence–absence fossil data (r2 = 0.47–0.56 and 0.26–0.42, respectively). Overall, model performance confirms that Chironomidae are valuable bioindicators for natural and man-made changes in the water balance of African lakes.

2002 ◽  
Vol 59 (6) ◽  
pp. 938-951 ◽  
Author(s):  
Aline Philibert ◽  
Yves T Prairie

Despite the overwhelming tendency in paleolimnology to use both planktonic and benthic diatoms when inferring open-water chemical conditions, it remains questionable whether all taxa are appropriate and necessary to construct useful inference models. We examined this question using a 75-lake training set from Quebec (Canada) to assess whether model performance is affected by the deletion of benthic species. Because benthic species are known to experience very different chemical conditions than their planktonic counterparts, we hypothesized that they would introduce undesirable noise in the calibration. Surprisingly, such important variables as pH, total phosphorus (TP), total nitrogen (TN), and dissolved organic carbon (DOC) were well predicted from weighted-averaging partial least square (WA-PLS) models based solely on benthic species. Similar results were obtained regardless of the depth of the lakes. Although the effective number of occurrence (N2) and the tolerance of species influenced the stability of the model residual error (jackknife), the number of species was the major factor responsible for the weaker inference models when based on planktonic diatoms alone. Indeed, when controlled for the number of species in WA-PLS models, individual planktonic diatom species showed superior predictive power over individual benthic species in inferring open-water chemical conditions.


2016 ◽  
Vol 12 (5) ◽  
pp. 1263-1280 ◽  
Author(s):  
Frazer Matthews-Bird ◽  
Stephen J. Brooks ◽  
Philip B. Holden ◽  
Encarni Montoya ◽  
William D. Gosling

Abstract. Presented here is the first chironomid calibration data set for tropical South America. Surface sediments were collected from 59 lakes across Bolivia (15 lakes), Peru (32 lakes), and Ecuador (12 lakes) between 2004 and 2013 over an altitudinal gradient from 150 m above sea level (a.s.l) to 4655 m a.s.l, between 0–17° S and 64–78° W. The study sites cover a mean annual temperature (MAT) gradient of 25 °C. In total, 55 chironomid taxa were identified in the 59 calibration data set lakes. When used as a single explanatory variable, MAT explains 12.9 % of the variance (λ1/λ2 =  1.431). Two inference models were developed using weighted averaging (WA) and Bayesian methods. The best-performing model using conventional statistical methods was a WA (inverse) model (R2jack =  0.890; RMSEPjack =  2.404 °C, RMSEP – root mean squared error of prediction; mean biasjack =  −0.017 °C; max biasjack =  4.665 °C). The Bayesian method produced a model with R2jack =  0.909, RMSEPjack =  2.373 °C, mean biasjack =  0.598 °C, and max biasjack =  3.158 °C. Both models were used to infer past temperatures from a ca. 3000-year record from the tropical Andes of Ecuador, Laguna Pindo. Inferred temperatures fluctuated around modern-day conditions but showed significant departures at certain intervals (ca. 1600 cal yr BP; ca. 3000–2500 cal yr BP). Both methods (WA and Bayesian) showed similar patterns of temperature variability; however, the magnitude of fluctuations differed. In general the WA method was more variable and often underestimated Holocene temperatures (by ca. −7 ± 2.5 °C relative to the modern period). The Bayesian method provided temperature anomaly estimates for cool periods that lay within the expected range of the Holocene (ca. −3 ± 3.4 °C). The error associated with both reconstructions is consistent with a constant temperature of 20 °C for the past 3000 years. We would caution, however, against an over-interpretation at this stage. The reconstruction can only currently be deemed qualitative and requires more research before quantitative estimates can be generated with confidence. Increasing the number, and spread, of lakes in the calibration data set would enable the detection of smaller climate signals.


2004 ◽  
Vol 55 (4) ◽  
pp. 471 ◽  
Author(s):  
John Guthrie ◽  
Colin Greensill ◽  
Ray Bowden ◽  
Kerry Walsh

Spectral data were collected of intact and ground kernels using 3 instruments (using Si-PbS, Si, and InGaAs detectors), operating over different areas of the spectrum (between 400 and 2500 nm) and employing transmittance, interactance, and reflectance sample presentation strategies. Kernels were assessed on the basis of oil and water content, and with respect to the defect categories of insect damage, rancidity, discoloration, mould growth, germination, and decomposition. Predictive model performance statistics for oil content models were acceptable on all instruments (R2 > 0.98; RMSECV < 2.5%, which is similar to reference analysis error), although that for the instrument employing reflectance optics was inferior to models developed for the instruments employing transmission optics. The spectral positions for calibration coefficients were consistent with absorbance due to the third overtones of CH2 stretching. Calibration models for moisture content in ground samples were acceptable on all instruments (R2 > 0.97; RMSECV < 0.2%), whereas calibration models for intact kernels were relatively poor. Calibration coefficients were more highly weighted around 1360, 740 and 840 nm, consistent with absorbance due to overtones of O-H stretching and combination. Intact kernels with brown centres or rancidity could be discriminated from each other and from sound kernels using principal component analysis. Part kernels affected by insect damage, discoloration, mould growth, germination, and decomposition could be discriminated from sound kernels. However, discrimination among these defect categories was not distinct and could not be validated on an independent set.It is concluded that there is good potential for a low cost Si photodiode array instrument to be employed to identify some quality defects of intact macadamia kernels and to quantify oil and moisture content of kernels in the process laboratory and for oil content in-line. Further work is required to examine the robustness of predictive models across different populations, including growing districts, cultivars and times of harvest.


2019 ◽  
Vol 147 (10) ◽  
pp. 3633-3647 ◽  
Author(s):  
Q. J. Wang ◽  
Tony Zhao ◽  
Qichun Yang ◽  
David Robertson

Abstract Statistical calibration of forecasts from numerical weather prediction (NWP) models aims to produce forecasts that are unbiased, reliable in ensemble spread, and as skillful as possible. We suggest that the calibrated forecasts should also be coherent in climatology, including seasonality, consistent with observations. This is especially important when forecasts approach climatology as forecast skill becomes low, such as at long lead times. However, it is challenging to achieve these aims when data available to establish sophisticated calibration models are limited. Many NWP models have only a short period of archived data, typically one year or less, when they become officially operational. In this paper, we introduce a seasonally coherent calibration (SCC) model for working effectively with limited archived NWP data. Detailed rationale and mathematical formulations are presented. In the development of the model, three issues are resolved. These are 1) constructing a calibration model that is sophisticated enough to allow for seasonal variation in the statistical characteristics of raw forecasts and observations, 2) bringing climatology that is representative of long-term statistics into the calibration model, and 3) reducing the number of model parameters through sensible reparameterization to make the model workable with short NWP dataset. A case study is conducted to examine model assumptions and evaluate model performance. We find that the model assumptions are sound, and the developed SCC model produces well-calibrated forecasts.


2003 ◽  
Vol 60 (10) ◽  
pp. 1177-1189 ◽  
Author(s):  
Darren G Bos ◽  
Brian F Cumming

To develop models to predict past lake-water nutrient levels, the sedimentary remains of Cladocera were sampled from 53 lakes in central British Columbia, Canada. At the same time, the lakes were sampled for a suite of chemical variables. In addition, a host of physical and spatial explanatory variables were collected from each site. Canonical correspondence analysis showed that total phosphorus (TP), which ranged from 5 to 146 µg·L–1, was the measured environmental variable that best described the differences in species composition among the lakes. Additionally, lake depth and surface water temperature were also important in explaining the distribution of cladoceran taxa. Chydorus brevilabris, Daphnia ambigua, Daphnia cf. pulex, and Graptoleberis testudinaria had a preference for eutrophic lakes, whereas Acroperus harpae, Alonella nana, Alonella excisa, Chydorus piger, Daphnia cf. dentifera, and Eubosmina spp. were found in the less productive lakes. Predictive models to estimate TP from species abundance data were developed using weighted averaging techniques. This research has produced strong and significant inference models, which can now be used to reconstruct past changes in lake trophic status from remains of Cladocera in sediment cores.


JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Anthony Finch ◽  
Alexander Crowell ◽  
Yung-Chieh Chang ◽  
Pooja Parameshwarappa ◽  
Jose Martinez ◽  
...  

Abstract Objective Attention networks learn an intelligent weighted averaging mechanism over a series of entities, providing increases to both performance and interpretability. In this article, we propose a novel time-aware transformer-based network and compare it to another leading model with similar characteristics. We also decompose model performance along several critical axes and examine which features contribute most to our model’s performance. Materials and methods Using data sets representing patient records obtained between 2017 and 2019 by the Kaiser Permanente Mid-Atlantic States medical system, we construct four attentional models with varying levels of complexity on two targets (patient mortality and hospitalization). We examine how incorporating transfer learning and demographic features contribute to model success. We also test the performance of a model proposed in recent medical modeling literature. We compare these models with out-of-sample data using the area under the receiver-operator characteristic (AUROC) curve and average precision as measures of performance. We also analyze the attentional weights assigned by these models to patient diagnoses. Results We found that our model significantly outperformed the alternative on a mortality prediction task (91.96% AUROC against 73.82% AUROC). Our model also outperformed on the hospitalization task, although the models were significantly more competitive in that space (82.41% AUROC against 80.33% AUROC). Furthermore, we found that demographic features and transfer learning features which are frequently omitted from new models proposed in the EMR modeling space contributed significantly to the success of our model. Discussion We proposed an original construction of deep learning electronic medical record models which achieved very strong performance. We found that our unique model construction outperformed on several tasks in comparison to a leading literature alternative, even when input data was held constant between them. We obtained further improvements by incorporating several methods that are frequently overlooked in new model proposals, suggesting that it will be useful to explore these options further in the future.


Author(s):  
Mengmeng Liu ◽  
Iain Colin Prentice ◽  
Cajo J. F. ter Braak ◽  
Sandy P. Harrison

Quantitative reconstructions of past climates are an important resource for evaluating how well climate models reproduce climate changes. One widely used statistical approach for making such reconstructions from fossil biotic assemblages is weighted averaging partial least-squares regression (WA-PLS). There is however a known tendency for WA-PLS to yield reconstructions compressed towards the centre of the climate range used for calibration, potentially biasing the reconstructed past climates. We present an improvement of WA-PLS by assuming that: (i) the theoretical abundance of each taxon is unimodal with respect to the climate variable considered; (ii) observed taxon abundances follow a multinomial distribution in which the total abundance of a sample is climatically uninformative; and (iii) the estimate of the climate value at a given site and time makes the observation most probable, i.e. it maximizes the log-likelihood function. This climate estimate is approximated by weighting taxon abundances in WA-PLS by the inverse square of their climate tolerances. We further improve the approach by considering the frequency (  fx ) of the climate variable in the training dataset. Tolerance-weighted WA-PLS with fx correction greatly reduces the compression bias, compared with WA-PLS, and improves model performance in reconstructions based on an extensive modern pollen dataset.


The Holocene ◽  
2006 ◽  
Vol 16 (1) ◽  
pp. 105-117 ◽  
Author(s):  
Valenti Rull

The numerical relationship between modem pollen assemblages and altitude in high mountain environments from the northern Andes is analysed, in order to found inference models that allow estimating palaeoaltitudes and palaeotemperatures from past pollen records. The calibration set (DM) consists of a 50-sample altitudinal transect between-2300 and-4600 m altitude. The overall and individual pollen responses to altitude were tested by correspondence analysis (CA), generalized linear regression (HOF) and weighted averaging (WA). Transfer functions were derived by weighted averaging partial least squares (WA-PLS) regression. Overall, altitude is the main controlling factor for the composition of pollen assemblages, as shown by the high correlation between altitude and the first CA component (r =-0.88). Individually, around 35% of the 82 pollen taxa show a significant response to altitude through monotonic or unimodal functions. The best transfer function obtained has a good statistical performance, as shown by the determination coefficient (r2tck =0.78). The prediction power, as measured by the root mean square error of prediction (RMSEP), is of 256 m (12% of the total altitudinal gradient), which is equivalent to-1.5C. These parameters fall within the performance range of the inference models developed elsewhere using pollen and other biological proxies. It is concluded that the DM training set is useful to reconstruct Pleistocene and major Holocene palaeoclimatic trends. This study demonstrates the suitability of establishing reliable transfer functions for palaeoclimatic estimation in the highest altitudes of the tropical Andes, and encourages their continued improvement.


2007 ◽  
Vol 13 ◽  
pp. 13-31 ◽  
Author(s):  
Brian F. Cumming ◽  
Katrina A. Moser

Applications of commonly used numerical techniques in diatom-based paleoecology are reviewed including: approaches used to model diatom taxa to important limnological variables; ordination and other commonly used multivariate approaches; and the myriad of approaches that are now being explored to infer environmental variables based on diatom assemblages.Modelling the response of individual diatom taxa to limnologically important variables is consistent with ecological theory and has been largely accomplished using approaches based on generalized linear models. These techniques have established that strong and significant relationships exist between the numerically dominant diatom taxa and important limnological variables (e.g., pH, nutrients, salinity). Null modelling approaches have also been used. However, inclusion of rare taxa in null models results in high rates of type-II errors, and consequently spurious claims that only a minority of diatoms have significant relationships to important limnological variables such as lakewater pH and nutrients.A variety of ordination techniques are widely used in diatom-based paleolimnological studies to aid in summarizing the main directions of variation in diatom assemblages, and to identify limnological variables that are strongly correlated to the diatom assemblages, both in time and space. More advanced ordination techniques, such as partial ordinations, are increasingly being used to assess the shared and unique variance attributable to groups of important limnological variables. Further, diatom-based approaches based on experimental designs with control lakes and appropriate multivariate statistics are now becoming increasingly common to assess, for example, the impact of forestry on water quality.A number of different diatom-based inference models based on the present-day relationships between diatom assemblages and limnological variables are now available for inferring important limnological variables. These approaches vary from simple approaches such as weighted-averaging to more complex approaches involving curve fitting and maximum likelihood, neural networks, and Bayesian statistics. All of these approaches have been shown to result in strong inference models, each using aspects of ecological information available from the diatom assemblages.


Sign in / Sign up

Export Citation Format

Share Document