Adaptive multiscale principal components analysis for online monitoring of wastewater treatment

J. Lennox; C. Rosen

doi:10.2166/wst.2002.0593

Adaptive multiscale principal components analysis for online monitoring of wastewater treatment

Water Science & Technology ◽

10.2166/wst.2002.0593 ◽

2002 ◽

Vol 45 (4-5) ◽

pp. 227-235 ◽

Cited By ~ 29

Author(s):

J. Lennox ◽

C. Rosen

Keyword(s):

Wastewater Treatment ◽

Principal Components Analysis ◽

Principal Components ◽

Fault Detection And Isolation ◽

Scale Model ◽

Multivariate Statistical ◽

Data Set ◽

Significant Difference ◽

Components Analysis ◽

Adaptive Pca

Fault detection and isolation (FDI) are important steps in the monitoring and supervision of industrial processes. Biological wastewater treatment (WWT) plants are difficult to model, and hence to monitor, because of the complexity of the biological reactions and because plant influent and disturbances are highly variable and/or unmeasured. Multivariate statistical models have been developed for a wide variety of situations over the past few decades, proving successful in many applications. In this paper we develop a new monitoring algorithm based on Principal Components Analysis (PCA). It can be seen equivalently as making Multiscale PCA (MSPCA) adaptive, or as a multiscale decomposition of adaptive PCA. Adaptive Multiscale PCA (AdMSPCA) exploits the changing multivariate relationships between variables at different time-scales. Adaptation of scale PCA models over time permits them to follow the evolution of the process, inputs or disturbances. Performance of AdMSPCA and adaptive PCA on a real WWT data set is compared and contrasted. The most significant difference observed was the ability of AdMSPCA to adapt to a much wider range of changes. This was mainly due to the flexibility afforded by allowing each scale model to adapt whenever it did not signal an abnormal event at that scale. Relative detection speeds were examined only summarily, but seemed to depend on the characteristics of the faults/disturbances. The results of the algorithms were similar for sudden changes, but AdMSPCA appeared more sensitive to slower changes.

Download Full-text

Principal components analysis of employment in Eastern Europe

Panoeconomicus ◽

10.2298/pan0604427s ◽

2006 ◽

Vol 53 (4) ◽

pp. 427-437 ◽

Cited By ~ 4

Author(s):

Mirko Savic

Keyword(s):

Eastern Europe ◽

Principal Components Analysis ◽

Principal Components ◽

Statistical Analyses ◽

Multivariate Methods ◽

Eastern European ◽

Multivariate Statistical Analyses ◽

Multivariate Statistical ◽

Full Picture ◽

Components Analysis

For the last decade, the employment structure is one of the fastest changing areas of Eastern Europe. This paper explores the best methodology to compare the employment situations in the countries of this region. Multivariate statistical analyses are very reliable in portraying the full picture of the problem. Principal components analysis is one of the simplest multivariate methods. It can produce very useful information about Eastern European employment in a very easy and understandable way.

Download Full-text

Dietary patterns in UK adolescents obtained from a dual-source FFQ and their associations with socio-economic position, nutrient intake and modes of eating

Public Health Nutrition ◽

10.1017/s1368980013001547 ◽

2013 ◽

Vol 17 (7) ◽

pp. 1476-1485 ◽

Cited By ~ 27

Author(s):

Kate Northstone ◽

Andrew DAC Smith ◽

Victoria L Cribb ◽

Pauline M Emmett

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Dietary Patterns ◽

Data Set ◽

Sociodemographic Differences ◽

Traditional Health ◽

Parents And Children ◽

Components Analysis ◽

Positive Correlations ◽

Dietary Data

AbstractObjectiveTo derive dietary patterns using principal components analysis from separate FFQ completed by mothers and their teenagers and to assess associations with nutrient intakes and sociodemographic variables.DesignTwo distinct FFQ were completed by 13-year-olds and their mothers, with some overlap in the foods covered. A combined data set was obtained.SettingAvon Longitudinal Study of Parents and Children (ALSPAC), Bristol, UK.SubjectsTeenagers (n 5334) with adequate dietary data.ResultsFour patterns were obtained using principal components analysis: a ‘Traditional/health-conscious’ pattern, a ‘Processed’ pattern, a ‘Snacks/sugared drinks’ pattern and a ‘Vegetarian’ pattern. The ‘Traditional/health-conscious’ pattern was the most nutrient-rich, having high positive correlations with many nutrients. The ‘Processed’ and ‘Snacks/sugared drinks’ patterns showed little association with important nutrients but were positively associated with energy, fats and sugars. There were clear gender and sociodemographic differences across the patterns. Lower scores were seen on the ‘Traditional/health conscious’ and ‘Vegetarian’ patterns in males and in those with younger and less educated mothers. Higher scores were seen on the ‘Traditional/health-conscious’ and ‘Vegetarian’ patterns in girls and in those whose mothers had higher levels of education.ConclusionsIt is important to establish healthy eating patterns by the teenage years. However, this is a time when it is difficult to accurately establish dietary intake from a single source, since teenagers consume increasing amounts of foods outside the home. Further dietary pattern studies should focus on teenagers and the source of dietary data collection merits consideration.

Download Full-text

The use of principal components analysis for the investigation of an organic air pollutants data set

Atmospheric Environment (1967) ◽

10.1016/0004-6981(84)90017-9 ◽

1984 ◽

Vol 18 (11) ◽

pp. 2471-2478 ◽

Cited By ~ 31

Author(s):

J. Smeyers-Verbeke ◽

J.C. Den Hartog ◽

W.H. Dehker ◽

D. Coomans ◽

L. Buydens ◽

...

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Air Pollutants ◽

Data Set ◽

Components Analysis

Download Full-text

Multivariate Characterization of Hydrogen Balmer Emission in Cataclysmic Variables

Publications of the Astronomical Society of Australia ◽

10.1071/as06011 ◽

2006 ◽

Vol 23 (3) ◽

pp. 106-118 ◽

Cited By ~ 5

Author(s):

Gordon E. Sarty ◽

Kinwah Wu

Keyword(s):

Principal Components Analysis ◽

Discriminant Function ◽

Principal Components ◽

Discriminant Function Analysis ◽

Function Analysis ◽

Cataclysmic Variables ◽

Data Set ◽

Analysis Methods ◽

Wide Range ◽

Components Analysis

AbstractThe ratios of hydrogen Balmer emission line intensities in cataclysmic variables are signatures of the physical processes that produce them. To quantify those signatures relative to classifications of cataclysmic variable types, we applied the multivariate statistical analysis methods of principal components analysis and discriminant function analysis to the spectroscopic emission data set of Williams (1983). The two analysis methods reveal two different sources of variation in the ratios of the emission lines. The source of variation seen in the principal components analysis was shown to be correlated with the binary orbital period. The source of variation seen in the discriminant function analysis was shown to be correlated with the equivalent width of the Hβ line. Comparison of the data scatterplot with scatterplots of theoretical models shows that Balmer line emission from T CrB systems is consistent with the photoionization of a surrounding nebula. Otherwise, models that we considered do not reproduce the wide range of Balmer decrements, including ‘inverted’ decrements, seen in the data.

Download Full-text

Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics

10.1101/627448 ◽

2019 ◽

Cited By ~ 4

Author(s):

Fred L. Bookstein

Keyword(s):

Geometric Morphometrics ◽

Principal Components Analysis ◽

Principal Components ◽

Null Model ◽

Statistical Technique ◽

Multivariate Statistical ◽

Wide Range ◽

Pls Analysis ◽

Components Analysis ◽

First Time

AbstractGood empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “highp/n,” wherepis the count of variables andnthe count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-p/nsetting. The more obvious pathology is this: when applied to the patternless (null) model ofpidentically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are actually fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-p/nsettings the bgPCA method very often leads to invalid or insecure bioscientific inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically — it is never authoritative — and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.

Download Full-text

Planktonic Biomass Trajectories In Lake Ecosystems

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f83-204 ◽

1983 ◽

Vol 40 (10) ◽

pp. 1752-1760 ◽

Cited By ~ 3

Author(s):

Michael A. Gates ◽

Ann P. Zimmerman ◽

W. Gary Sprules ◽

Roy Knoechel

Keyword(s):

Hypothesis Testing ◽

Principal Components Analysis ◽

Biomass Allocation ◽

Principal Components ◽

Primary Component ◽

Size Category ◽

Data Set ◽

Lake Ecosystems ◽

The Mean ◽

Components Analysis

We introduce a method, based on principal components analysis, for studying temporal changes in biomass allocation among 16 size–category compartments of lake plankton. Applied to data from a series of 12 Ontario lakes over three sampling seasons, the technique provides a simple means of visualizing shifts in patterns of biomass allocation, and it allows comparative analyses of biomass fluctuations in different lakes. Each of the primary component axes is interpretable. Furthermore, a large proportion of the variance in both the mean position of a lake and its movement along these axes is interpreted as a function of lake physicochemistry. The analysis also provides weighted scores for use in hypothesis testing which are an improvement over mean biomass values alone, because they take into account the structure of variation in the data set.

Download Full-text

Simulation of an industrial wastewater treatment plant using artificial neural networks and principal components analysis

Brazilian Journal of Chemical Engineering ◽

10.1590/s0104-66322002000400002 ◽

2002 ◽

Vol 19 (4) ◽

pp. 365-370 ◽

Cited By ~ 47

Author(s):

K.P. Oliveira-Esquerre ◽

M. Mori ◽

R.E. Bruns

Keyword(s):

Neural Networks ◽

Wastewater Treatment ◽

Artificial Neural Networks ◽

Principal Components Analysis ◽

Principal Components ◽

Wastewater Treatment Plant ◽

Industrial Wastewater ◽

Treatment Plant ◽

Industrial Wastewater Treatment ◽

Components Analysis

Download Full-text

Haplotype Classification Using Copy Number Variation and Principal Components Analysis

The Open Bioinformatics Journal ◽

10.2174/1875036201307010019 ◽

2013 ◽

Vol 7 (1) ◽

pp. 19-24

Author(s):

Kevin Blighe

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Large Scale ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Reduction Techniques ◽

Number Variation ◽

Components Analysis

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.

Download Full-text

Basic Multivariate Statistical Methods for Environmental Monitoring Data Mining: Introductory Course for Master Students

Chemistry-Didactics-Ecology-Metrology ◽

10.2478/cdem-2020-0002 ◽

2020 ◽

Vol 25 (1-2) ◽

pp. 35-56

Author(s):

Vasil Simeonov

Keyword(s):

Environmental Monitoring ◽

Principal Components Analysis ◽

Statistical Methods ◽

Principal Components ◽

Monitoring Data ◽

Multivariate Statistical Methods ◽

Multivariate Statistical ◽

Introductory Course ◽

Components Analysis ◽

Environmental Monitoring Data

Abstract The present introductory course of lectures summarizes the principles and algorithms of several widely used multivariate statistical methods: cluster analysis, principal components analysis, principal components regression, N-way principal components analysis, partial least squares regression and self-organizing maps with respect to their possible application in intelligent analysis, classification, modelling and interpretation to environmental monitoring data. The target group of possible users is master program students (environmental chemistry, analytical chemistry, environmental modelling and risk assessment etc.).

Download Full-text

A Data-Driven Approach to the Fragile Families Challenge: Prediction through Principal-Components Analysis and Random Forests

Socius Sociological Research for a Dynamic World ◽

10.1177/2378023118818720 ◽

2019 ◽

Vol 5 ◽

pp. 237802311881872 ◽

Cited By ~ 1

Author(s):

Ryan Compton

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Data Driven ◽

Machine Learning Techniques ◽

Fragile Families ◽

Sociological Research ◽

Data Set ◽

Dependent Variables ◽

Data Driven Approach ◽

Components Analysis

Sociological research typically involves exploring theoretical relationships, but the emergence of “big data” enables alternative approaches. This work shows the promise of data-driven machine-learning techniques involving feature engineering and predictive model optimization to address a sociological data challenge. The author’s group develops improved generalizable models to identify at-risk families. Principal-components analysis and decision tree modeling are used to predict six main dependent variables in the Fragile Families Challenge, successfully modeling one binary variable but no continuous dependent variables in the diagnostic data set. This indicates that some binary dependent variables are more predictable using a reduced set of uncorrelated independent variables, and continuous dependent variables demand more complexity.

Download Full-text