Regional variability in water use intensity in China by clustering analysis

2016 ◽  
Vol 16 (4) ◽  
pp. 1102-1109
Author(s):  
Xinghua Fan ◽  
Huihui Xu ◽  
Cheng Ning ◽  
Liangjie Wu

Water use intensity (WUI) reveals water withdrawals with respect to economic output. Decomposing WUI into factors provides inner-system information affecting the indicator. The present study investigates variability in WUI among provinces in China by clustering the principal components of the decomposed factors. Motivated by the index decomposition method, the authors decomposed WUI into seven factors: water use in agricultural, industrial, household and ecological sectors, exploitation rate of water resources, per capita water resources and population intensity. Those seven factors condense into four principal components under application of principal component analysis. Comprehensive WUI is calculated by these four components. Then the cluster analysis is applied to get different patterns in WUI. The principal components and the comprehensive intensity are taken as cluster variables. The number of clusters is determined to be three by applying the k-means clustering method and the F-statistic value. Variability in WUI is detected by implementing three clustering algorithms, namely k-means, fuzzy c-means and the Gaussian mixture model. WUI in China is clustered into three clusters by the k-means clustering method. Characteristics of each cluster are analyzed.

2021 ◽  
Vol 10 (4) ◽  
pp. 2170-2180
Author(s):  
Untari N. Wisesty ◽  
Tati Rajab Mengko

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.


2016 ◽  
Vol 51 (1) ◽  
pp. 74-81 ◽  
Author(s):  
Jacob A. Siedlik ◽  
Charles Bergeron ◽  
Michael Cooper ◽  
Russell Emmons ◽  
William Moreau ◽  
...  

Context Analysis of injury and illness data collected at large international competitions provides the US Olympic Committee and the national governing bodies for each sport with information to best prepare for future competitions. Research in which authors have evaluated medical contacts to provide the expected level of medical care and sports medicine services at international competitions is limited. Objective To analyze the medical-contact data for athletes, staff, and coaches who participated in the 2011 Pan American Games in Guadalajara, Mexico, using unsupervised modeling techniques to identify underlying treatment patterns. Design Descriptive epidemiology study. Setting Pan American Games. Patients or Other Participants A total of 618 US athletes (337 males, 281 females) participated in the 2011 Pan American Games. Main Outcome Measure(s) Medical data were recorded from the injury-evaluation and injury-treatment forms used by clinicians assigned to the central US Olympic Committee Sport Medicine Clinic and satellite locations during the operational 17-day period of the 2011 Pan American Games. We used principal components analysis and agglomerative clustering algorithms to identify and define grouped modalities. Lift statistics were calculated for within-cluster subgroups. Results Principal component analyses identified 3 components, accounting for 72.3% of the variability in datasets. Plots of the principal components showed that individual contacts focused on 4 treatment clusters: massage, paired manipulation and mobilization, soft tissue therapy, and general medical. Conclusions Unsupervised modeling techniques were useful for visualizing complex treatment data and provided insights for improved treatment modeling in athletes. Given its ability to detect clinically relevant treatment pairings in large datasets, unsupervised modeling should be considered a feasible option for future analyses of medical-contact data from international competitions.


Rheumatology ◽  
2019 ◽  
Vol 59 (5) ◽  
pp. 1066-1075
Author(s):  
Elham Rezaei ◽  
Daniel Hogan ◽  
Brett Trost ◽  
Anthony J Kusalik ◽  
Gilles Boire ◽  
...  

Abstract Objective To identify discrete clusters comprising clinical features and inflammatory biomarkers in children with JIA and to determine cluster alignment with JIA categories. Methods A Canadian prospective inception cohort comprising 150 children with JIA was evaluated at baseline (visit 1) and after six months (visit 2). Data included clinical manifestations and inflammation-related biomarkers. Probabilistic principal component analysis identified sets of composite variables, or principal components, from 191 original variables. To discern new clinical-biomarker clusters (clusters), Gaussian mixture models were fit to the data. Newly-defined clusters and JIA categories were compared. Agreement between the two was assessed using Kruskal–Wallis analyses and contingency plots. Results Three principal components recovered 35% (three clusters) and 40% (five clusters) of the variance in patient profiles in visits 1 and 2, respectively. None of the clusters aligned precisely with any of the seven JIA categories but rather spanned multiple categories. Results demonstrated that the newly defined clinical-biomarker lustres are more homogeneous than JIA categories. Conclusion Applying unsupervised data mining to clinical and inflammatory biomarker data discerns discrete clusters that intersect multiple JIA categories. Results suggest that certain groups of patients within different JIA categories are more aligned pathobiologically than their separate clinical categorizations suggest. Applying data mining analyses to complex datasets can generate insights into JIA pathogenesis and could contribute to biologically based refinements in JIA classification.


Water ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1097
Author(s):  
Shaohong Han ◽  
Jizhi Zhou ◽  
Zeyuan Liu ◽  
Lijian Zhang ◽  
Xin Huang

A progressive price scheme (PPS) has been implemented in Shanghai since 2013 in consideration of residents’ ability to pay, and charges are based on the actual water consumption of the residents, in an effort to balance the rational allocation of water resources and the goal of saving water between rich and poor families. In the current work, the effect of the PPS for water use was evaluated based on the water use of 6661 households from 14 communities in Shanghai. It was found that the PPS did not reduce household water consumption when comparing the water consumption per household both before and after the implementation of the PPS policy. To investigate the weakness of the PPS, a principal component analysis (PCA) and a hierarchical cluster analysis (HCA) were conducted to access the relationships between mean household water use and community factors such as housing price, management fees, and the number of parking sites. Moreover, a significant inverted U-shaped curve between housing price and water use was found, which demonstrates that rental households shared by several tenants were the main consumers of residential water, and they were not sensitive to the water price improvement in the PPS due to sharing water prices. Therefore, a proposal was made in this work to increase the proportion of water fee expenditure in the total household income and to use 3% as the benchmark for water affordability. Our results provided a new picture of residential water use in big cities and a method for saving and balancing urban water resources.


2013 ◽  
Vol 38 (1) ◽  
pp. 39-45
Author(s):  
Peng Song ◽  
Li Zhao ◽  
Yongqiang Bao

Abstract The Gaussian mixture model (GMM) method is popular and efficient for voice conversion (VC), but it is often subject to overfitting. In this paper, the principal component regression (PCR) method is adopted for the spectral mapping between source speech and target speech, and the numbers of principal components are adjusted properly to prevent the overfitting. Then, in order to better model the nonlinear relationships between the source speech and target speech, the kernel principal component regression (KPCR) method is also proposed. Moreover, a KPCR combined with GMM method is further proposed to improve the accuracy of conversion. In addition, the discontinuity and oversmoothing problems of the traditional GMM method are also addressed. On the one hand, in order to solve the discontinuity problem, the adaptive median filter is adopted to smooth the posterior probabilities. On the other hand, the two mixture components with higher posterior probabilities for each frame are chosen for VC to reduce the oversmoothing problem. Finally, the objective and subjective experiments are carried out, and the results demonstrate that the proposed approach shows greatly better performance than the GMM method. In the objective tests, the proposed method shows lower cepstral distances and higher identification rates than the GMM method. While in the subjective tests, the proposed method obtains higher scores of preference and perceptual quality.


Water ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 525 ◽  
Author(s):  
Abdessamad Tiouiouine ◽  
Suzanne Yameogo ◽  
Vincent Valles ◽  
Laurent Barbiero ◽  
Fabrice Dassonville ◽  
...  

The SISE-Eaux database of water intended for human consumption, archived by the French Regional Health Agency (ARS) since 1990, is a rich source of information. However, more or less regular monitoring over almost 30 years and the multiplication of parameters lead to a sparse matrix (observations × parameters) and a large dimension of the hyperspace of data. These characteristics make it difficult to exploit this database for a synthetic mapping of water quality, and to identify of the processes responsible for its diversity in a complex geological context and anthropized environment. A 10-year period (2006–2016) was selected from the Provence-Alpes- Côte d’Azur region database (PACA, southeastern France). We extracted 5,295 water samples, each with 15 parameters. A treatment by principal component analysis (PCA) followed with orthomax rotation allows for identifying and ranking six principal components (PCs) totaling 75% of the initial information. The association of the parameters with the principal components, and the regional distribution of the PCs make it possible to identify water-rock interactions, bacteriological contamination, redox processes and arsenic occurrence as the main sources of variability. However, the results also highlight a decrease of useful information, a constraint linked to the vast size and diversity of the study area. The development of a relevant tool for the protecting and managing of water resources will require identifying of subsets based on functional landscape units or the grouping of groundwater bodies.


2006 ◽  
Vol 27 (2) ◽  
pp. 87-92 ◽  
Author(s):  
Willem K.B. Hofstee ◽  
Dick P.H. Barelds ◽  
Jos M.F. Ten Berge

Hofstee and Ten Berge (2004a) have proposed a new look at personality assessment data, based on a bipolar proportional (-1, .. . 0, .. . +1) scale, a corresponding coefficient of raw-scores likeness L = ΢XY/N, and raw-scores principal component analysis. In a normal sample, the approach resulted in a structure dominated by a first principal component, according to which most people are faintly to mildly socially desirable. We hypothesized that a more differentiated structure would arise in a clinical sample. We analyzed the scores of 775 psychiatric clients on the 132 items of the Dutch Personality Questionnaire (NPV). In comparison to a normative sample (N = 3140), the eigenvalue for the first principal component appeared to be 1.7 times as small, indicating that such clients have less personality (social desirability) in common. Still, the match between the structures in the two samples was excellent after oblique rotation of the loadings. We applied the abridged m-dimensional circumplex design, by which persons are typed by their two highest scores on the principal components, to the scores on the first four principal components. We identified five types: Indignant (1-), Resilient (1-2+), Nervous (1-2-), Obsessive-Compulsive (1-3-), and Introverted (1-4-), covering 40% of the psychiatric sample. Some 26% of the individuals had negligible scores on all type vectors. We discuss the potential and the limitations of our approach in a clinical context.


Methodology ◽  
2016 ◽  
Vol 12 (1) ◽  
pp. 11-20 ◽  
Author(s):  
Gregor Sočan

Abstract. When principal component solutions are compared across two groups, a question arises whether the extracted components have the same interpretation in both populations. The problem can be approached by testing null hypotheses stating that the congruence coefficients between pairs of vectors of component loadings are equal to 1. Chan, Leung, Chan, Ho, and Yung (1999) proposed a bootstrap procedure for testing the hypothesis of perfect congruence between vectors of common factor loadings. We demonstrate that the procedure by Chan et al. is both theoretically and empirically inadequate for the application on principal components. We propose a modification of their procedure, which constructs the resampling space according to the characteristics of the principal component model. The results of a simulation study show satisfactory empirical properties of the modified procedure.


Sign in / Sign up

Export Citation Format

Share Document