Clustering Methods for Defect Tracking in Order to Assess the Performance of a Porosity Inspection System

Manufacturing Science and Engineering, Parts A and B ◽

10.1115/msec2006-21135 ◽

2006 ◽

Author(s):

Rachel N. Rubin ◽

J. Patrick Spicer ◽

Reuven R. Katz

Keyword(s):

A Priori ◽

Computation Time ◽

Repeated Measurements ◽

Inspection System ◽

Surface Porosity ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering ◽

Dimensional Measurements ◽

Time Required

Surface porosity inspection is important for quality assurance of critical mating surfaces on machined components. An important metric for assessing the performance of an automated surface porosity inspection system is repeatability. Traditional gage repeatability analysis is well defined for dimensional measurements of machined part features. However, the analysis becomes more difficult for surface porosity inspection. This is because surface porosity appears in random sizes and in random locations. Repeatability analysis requires painstaking effort in tracking individual pores through repeated measurements. Therefore, this paper presents an automated approach for tracking porosity for the purpose of repeatability analysis. Two different algorithms are proposed and evaluated. The first is a tolerance based method that uses pre-specified tolerances to determine if pores should be grouped together. The second algorithm is similar to hierarchical agglomerative clustering, using a similarity matrix to store differences between cluster centroids. However, this algorithm uses a training period to determine when to stop clustering instead of continuing until all pores are in one cluster. Experimental results describe differences in the accuracy of both approaches and effort required to obtain a solution. The computation time required for the first method is much shorter than that of the second method. However, the first algorithm requires a-priori information to specify the tolerances, whereas the second algorithm requires no prior knowledge.

Download Full-text

Research on NMF Based Hierarchical Clustering Methods

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.1306 ◽

2010 ◽

Vol 439-440 ◽

pp. 1306-1311

Author(s):

Fang Li ◽

Qun Xiong Zhu

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Clustering Method ◽

Hierarchical Agglomerative Clustering ◽

Hierarchical Clustering Methods

LSI based hierarchical agglomerative clustering algorithm is studied. Aiming to the problems of LSI based hierarchical agglomerative clustering method, NMF based hierarchical clustering method is proposed and analyzed. Two ways of implementing NMF based method are introduced. Finally the result of two groups of experiment based on the TanCorp document corpora show that the method proposed is effective.

Download Full-text

Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks

Plants ◽

10.3390/plants10020247 ◽

2021 ◽

Vol 10 (2) ◽

pp. 247

Author(s):

Juan Camilo Henao-Rojas ◽

María Gladis Rosero-Alpala ◽

Carolina Ortiz-Muñoz ◽

Carlos Enrique Velásquez-Arroyo ◽

William Alfonso Leon-Rueda ◽

...

Keyword(s):

Machine Learning ◽

Support Vector ◽

P Value ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Discriminating Power ◽

Hierarchical Agglomerative Clustering ◽

Machine Learning Applications ◽

Germplasm Banks ◽

Selection Of

Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization.

Download Full-text

Hierarchical Clustering Approach for Selecting Representative Skylines

Information ◽

10.3390/info10030096 ◽

2019 ◽

Vol 10 (3) ◽

pp. 96

Author(s):

Lkhagvadorj Battulga ◽

Aziz Nasridinov

Keyword(s):

Data Distribution ◽

Computation Time ◽

Agglomerative Clustering ◽

Skyline Query ◽

Big Data Applications ◽

Wide Range ◽

Hierarchical Agglomerative Clustering ◽

Data Points ◽

Low Dimensional ◽

Representative Skyline

Recently, the skyline query has attracted interest in a wide range of applications from recommendation systems to computer networks. The skyline query is useful to obtain the dominant data points from the given dataset. In the low-dimensional dataset, the skyline query may return a small number of skyline points. However, as the dimensionality of the dataset increases, the number of skyline points also increases. In other words, depending on the data distribution and dimensionality, most of the data points may become skyline points. With the emergence of big data applications, where the data distribution and dimensionality are a significant problem, obtaining representative skyline points among resulting skyline points is necessary. There have been several methods that focused on extracting representative skyline points with various success. However, existing methods have a problem of re-computation when the global threshold changes. Moreover, in certain cases, the resulting representative skyline points may not satisfy a user with multiple preferences. Thus, in this paper, we propose a new representative skyline query processing method, called representative skyline cluster (RSC), which solves the problems of the existing methods. Our method utilizes the hierarchical agglomerative clustering method to find the exact representative skyline points, which enable us to reduce the re-computation time significantly. We show the superiority of our proposed method over the existing state-of-the-art methods with various types of experiments.

Download Full-text

Clustering Methods for Gene-Expression Data

Handbook of Research on Systems Biology Applications in Medicine ◽

10.4018/978-1-60566-076-9.ch011 ◽

2009 ◽

pp. 209-220

Author(s):

L.K. Flack

Keyword(s):

Expression Patterns ◽

Multivariate Normal ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Self Organizing Maps ◽

Tissue Samples ◽

Normal Distributions ◽

Model Based ◽

Gene Profiles ◽

Hierarchical Agglomerative Clustering

Clustering methods are used to place items in natural patterns or convenient groups. They can be used to place genes into clusters to have similar expression patterns across the tissue samples of interest. They can also be used to cluster tissues into groups on the basis of their gene profiles. Examples of the methods used are hierarchical agglomerative clustering, k-means clustering, self organizing maps, and model-based methods. The focus of this chapter is on using mixtures of multivariate normal distributions to provide model-based clusterings of tissue samples and of genes.

Download Full-text

Clustering Techniques for Secondary Substations Siting

Energies ◽

10.3390/en14041028 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1028

Author(s):

Silvia Corigliano ◽

Federico Rosato ◽

Carla Ortiz Dominguez ◽

Marco Merlo

Keyword(s):

Rural Areas ◽

Urban Areas ◽

Universal Access ◽

Distribution Networks ◽

Industrialized Countries ◽

Agglomerative Clustering ◽

Clustering Techniques ◽

Hierarchical Agglomerative Clustering ◽

Efficient Planning ◽

Target Set

The scientific community is active in developing new models and methods to help reach the ambitious target set by UN SDGs7: universal access to electricity by 2030. Efficient planning of distribution networks is a complex and multivariate task, which is usually split into multiple subproblems to reduce the number of variables. The present work addresses the problem of optimal secondary substation siting, by means of different clustering techniques. In contrast with the majority of approaches found in the literature, which are devoted to the planning of MV grids in already electrified urban areas, this work focuses on greenfield planning in rural areas. K-means algorithm, hierarchical agglomerative clustering, and a method based on optimal weighted tree partitioning are adapted to the problem and run on two real case studies, with different population densities. The algorithms are compared in terms of different indicators useful to assess the feasibility of the solutions found. The algorithms have proven to be effective in addressing some of the crucial aspects of substations siting and to constitute relevant improvements to the classic K-means approach found in the literature. However, it is found that it is very challenging to conjugate an acceptable geographical span of the area served by a single substation with a substation power high enough to justify the installation when the load density is very low. In other words, well known standards adopted in industrialized countries do not fit with developing countries’ requirements.

Download Full-text

Identifying organ dysfunction trajectory-based subphenotypes in critically ill patients with COVID-19

Scientific Reports ◽

10.1038/s41598-021-95431-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chang Su ◽

Zhenxing Xu ◽

Katherine Hoffman ◽

Parag Goyal ◽

Monika M. Safford ◽

...

Keyword(s):

New York ◽

Respiratory Failure ◽

Sofa Score ◽

Severity Of Illness ◽

Agglomerative Clustering ◽

Baseline Severity ◽

Organ Systems ◽

Hierarchical Agglomerative Clustering ◽

Dynamic Time ◽

Post Intubation

AbstractCOVID-19-associated respiratory failure offers the unprecedented opportunity to evaluate the differential host response to a uniform pathogenic insult. Understanding whether there are distinct subphenotypes of severe COVID-19 may offer insight into its pathophysiology. Sequential Organ Failure Assessment (SOFA) score is an objective and comprehensive measurement that measures dysfunction severity of six organ systems, i.e., cardiovascular, central nervous system, coagulation, liver, renal, and respiration. Our aim was to identify and characterize distinct subphenotypes of COVID-19 critical illness defined by the post-intubation trajectory of SOFA score. Intubated COVID-19 patients at two hospitals in New York city were leveraged as development and validation cohorts. Patients were grouped into mild, intermediate, and severe strata by their baseline post-intubation SOFA. Hierarchical agglomerative clustering was performed within each stratum to detect subphenotypes based on similarities amongst SOFA score trajectories evaluated by Dynamic Time Warping. Distinct worsening and recovering subphenotypes were identified within each stratum, which had distinct 7-day post-intubation SOFA progression trends. Patients in the worsening suphenotypes had a higher mortality than those in the recovering subphenotypes within each stratum (mild stratum, 29.7% vs. 10.3%, p = 0.033; intermediate stratum, 29.3% vs. 8.0%, p = 0.002; severe stratum, 53.7% vs. 22.2%, p < 0.001). Pathophysiologic biomarkers associated with progression were distinct at each stratum, including findings suggestive of inflammation in low baseline severity of illness versus hemophagocytic lymphohistiocytosis in higher baseline severity of illness. The findings suggest that there are clear worsening and recovering subphenotypes of COVID-19 respiratory failure after intubation, which are more predictive of outcomes than baseline severity of illness. Distinct progression biomarkers at differential baseline severity of illness suggests a heterogeneous pathobiology in the progression of COVID-19 respiratory failure.

Download Full-text

Statistical Uncertainty of DNS in Geometries without Homogeneous Directions

Applied Sciences ◽

10.3390/app11041399 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1399

Author(s):

Jure Oder ◽

Cédric Flageul ◽

Iztok Tiselj

Keyword(s):

Channel Flow ◽

A Priori ◽

Time Integration ◽

Navier Stokes ◽

Statistical Uncertainty ◽

Time Step ◽

Order Of Magnitude ◽

Averaging Time ◽

The Individual ◽

Time Required

In this paper, we present uncertainties of statistical quantities of direct numerical simulations (DNS) with small numerical errors. The uncertainties are analysed for channel flow and a flow separation case in a confined backward facing step (BFS) geometry. The infinite channel flow case has two homogeneous directions and this is usually exploited to speed-up the convergence of the results. As we show, such a procedure reduces statistical uncertainties of the results by up to an order of magnitude. This effect is strongest in the near wall regions. In the case of flow over a confined BFS, there are no such directions and thus very long integration times are required. The individual statistical quantities converge with the square root of time integration so, in order to improve the uncertainty by a factor of two, the simulation has to be prolonged by a factor of four. We provide an estimator that can be used to evaluate a priori the DNS relative statistical uncertainties from results obtained with a Reynolds Averaged Navier Stokes simulation. In the DNS, the estimator can be used to predict the averaging time and with it the simulation time required to achieve a certain relative statistical uncertainty of results. For accurate evaluation of averages and their uncertainties, it is not required to use every time step of the DNS. We observe that statistical uncertainty of the results is uninfluenced by reducing the number of samples to the point where the period between two consecutive samples measured in Courant–Friedrichss–Levy (CFL) condition units is below one. Nevertheless, crossing this limit, the estimates of uncertainties start to exhibit significant growth.

Download Full-text

The Regularized Weak Functional Matching Pursuit for linear inverse problems

Journal of Inverse and Ill-Posed Problems ◽

10.1515/jiip-2018-0013 ◽

2019 ◽

Vol 27 (3) ◽

pp. 317-340 ◽

Cited By ~ 3

Author(s):

Max Kontak ◽

Volker Michel

Keyword(s):

Inverse Problems ◽

Matching Pursuit ◽

A Priori ◽

Computation Time ◽

Point Of View ◽

Computational Point ◽

Linear Inverse Problems ◽

Infinite Dimensional ◽

Frame Condition ◽

Ill Posed

Abstract In this work, we present the so-called Regularized Weak Functional Matching Pursuit (RWFMP) algorithm, which is a weak greedy algorithm for linear ill-posed inverse problems. In comparison to the Regularized Functional Matching Pursuit (RFMP), on which it is based, the RWFMP possesses an improved theoretical analysis including the guaranteed existence of the iterates, the convergence of the algorithm for inverse problems in infinite-dimensional Hilbert spaces, and a convergence rate, which is also valid for the particular case of the RFMP. Another improvement is the cancellation of the previously required and difficult to verify semi-frame condition. Furthermore, we provide an a-priori parameter choice rule for the RWFMP, which yields a convergent regularization. Finally, we will give a numerical example, which shows that the “weak” approach is also beneficial from the computational point of view. By applying an improved search strategy in the algorithm, which is motivated by the weak approach, we can save up to 90 of computation time in comparison to the RFMP, whereas the accuracy of the solution does not change as much.

Download Full-text

Improving ozone profile retrieval from spaceborne UV backscatter spectrometers using convergence behaviour diagnostics

Atmospheric Measurement Techniques ◽

10.5194/amt-3-1555-2010 ◽

2010 ◽

Vol 3 (6) ◽

pp. 1555-1568 ◽

Cited By ~ 13

Author(s):

B. Mijling ◽

O. N. E. Tuinder ◽

R. F. van Oss ◽

R. J. van der A

Keyword(s):

Cross Sections ◽

A Priori ◽

Computation Time ◽

External Input ◽

Computational Time ◽

Ozone Profile ◽

Global Performance ◽

Convergence Behaviour ◽

Low Cloud ◽

Average Computation Time

Abstract. The Ozone Profile Algorithm (OPERA), developed at KNMI, retrieves the vertical ozone distribution from nadir spectral satellite measurements of back scattered sunlight in the ultraviolet and visible wavelength range. To produce consistent global datasets the algorithm needs to have good global performance, while short computation time facilitates the use of the algorithm in near real time applications. To test the global performance of the algorithm we look at the convergence behaviour as diagnostic tool of the ozone profile retrievals from the GOME instrument (on board ERS-2) for February and October 1998. In this way, we uncover different classes of retrieval problems, related to the South Atlantic Anomaly, low cloud fractions over deserts, desert dust outflow over the ocean, and the intertropical convergence zone. The influence of the first guess and the external input data including the ozone cross-sections and the ozone climatologies on the retrieval performance is also investigated. By using a priori ozone profiles which are selected on the expected total ozone column, retrieval problems due to anomalous ozone distributions (such as in the ozone hole) can be avoided. By applying the algorithm adaptations the convergence statistics improve considerably, not only increasing the number of successful retrievals, but also reducing the average computation time, due to less iteration steps per retrieval. For February 1998, non-convergence was brought down from 10.7% to 2.1%, while the mean number of iteration steps (which dominates the computational time) dropped 26% from 5.11 to 3.79.

Download Full-text

Cluster analysis of the results of intraoperative optical spectroscopic diagnostics In brain glioma neurosurgery

Biomedical Photonics ◽

10.24931/2413-9432-2018-7-4-23-34 ◽

2019 ◽

Vol 7 (4) ◽

pp. 23-34

Author(s):

I. A. Osmakov ◽

T. A. Savelieva ◽

V. B. Loschenov ◽

S. A. Goryajnov ◽

A. A. Potapov

Keyword(s):

Cluster Analysis ◽

Optical Spectroscopy ◽

Protoporphyrin Ix ◽

Spectroscopy Data ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Data Set ◽

Broadband Radiation ◽

Degree Of Malignancy ◽

Normal White Matter

The paper presents the results of a comparative study of methods of cluster analysis of optical intraoperative spectroscopy data during surgery of glial tumors with varying degree of malignancy. The analysis was carried out both for individual patients and for the entire dataset. The data were obtained using combined optical spectroscopy technique, which allowed simultaneous registration of diﬀuse reﬂectance spectra of broadband radiation in the 500–600 nm spectral range (for the analysis of tissue blood supply and the degree of hemoglobin oxygenation), ﬂuorescence spectra of 5‑ALA induced protoporphyrin IX (Pp IX) (for analysis of the malignancy degree) and signal of diffusely reﬂected laser light used to excite Pp IX ﬂuorescence (to take into account the scattering properties of tissues). To determine the threshold values of these parameters for the tumor, the infltration zone and the normal white matter, we searched for the natural clusters in the available intraoperative optical spectroscopy data and compared them with the results of the pathomorphology. It was shown that, among the considered clustering methods, EM‑algorithm and k‑means methods are optimal for the considered data set and can be used to build a decision support system (DSS) for spectroscopic intraoperative navigation in neurosurgery. Results of clustering relevant to thepathological studies were also obtained using the methods of spectral and agglomerative clustering. These methods can be used to postprocess combined spectroscopy data.

Download Full-text