scholarly journals Clustering Ensemble for Identifying Defective Wafer Bin Map in Semiconductor Manufacturing

2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Chia-Yu Hsu

Wafer bin map (WBM) represents specific defect pattern that provides information for diagnosing root causes of low yield in semiconductor manufacturing. In practice, most semiconductor engineers use subjective and time-consuming eyeball analysis to assess WBM patterns. Given shrinking feature sizes and increasing wafer sizes, various types of WBMs occur; thus, relying on human vision to judge defect patterns is complex, inconsistent, and unreliable. In this study, a clustering ensemble approach is proposed to bridge the gap, facilitating WBM pattern extraction and assisting engineer to recognize systematic defect patterns efficiently. The clustering ensemble approach not only generates diverse clusters in data space, but also integrates them in label space. First, the mountain function is used to transform data by using pattern density. Subsequently,k-means and particle swarm optimization (PSO) clustering algorithms are used to generate diversity partitions and various label results. Finally, the adaptive response theory (ART) neural network is used to attain consensus partitions and integration. An experiment was conducted to evaluate the effectiveness of proposed WBMs clustering ensemble approach. Several criterions in terms of sum of squared error, precision, recall, andF-measure were used for evaluating clustering results. The numerical results showed that the proposed approach outperforms the other individual clustering algorithm.

Author(s):  
Tao Sun ◽  
Saeed Mashdour ◽  
Mohammad Reza Mahmoudi

Clustering ensemble is a new problem where it is aimed to extract a clustering out of a pool of base clusterings. The pool of base clusterings is sometimes referred to as ensemble. An ensemble is to be considered to be a suitable one, if its members are diverse and any of them has a minimum quality. The method that maps an ensemble into an output partition (called also as consensus partition) is named consensus function. The consensus function should find a consensus partition that all of the ensemble members agree on it as much as possible. In this paper, a novel clustering ensemble framework that guarantees generation of a pool of the base clusterings with the both conditions (diversity among ensemble members and high-quality members) is introduced. According to its limitations, a novel consensus function is also introduced. We experimentally show that the proposed clustering ensemble framework is scalable, efficient and general. Using different base clustering algorithms, we show that our improved base clustering algorithm is better. Also, among different consensus functions, we show the effectiveness of our consensus function. Finally, comparing with the state of the art, we find that the clustering ensemble framework is comparable or even better in terms of scalability and efficacy.


Author(s):  
Katti Faceli ◽  
Andre C.P.L.F. de Carvalho ◽  
Marcilio C.P. de Souto

Clustering is an important tool for data exploration. Several clustering algorithms exist, and new algorithms are frequently proposed in the literature. These algorithms have been very successful in a large number of real-world problems. However, there is no clustering algorithm, optimizing only a single criterion, able to reveal all types of structures (homogeneous or heterogeneous) present in a dataset. In order to deal with this problem, several multi-objective clustering and cluster ensemble methods have been proposed in the literature, including our multi-objective clustering ensemble algorithm. In this chapter, we present an overview of these methods, which, to a great extent, are based on the combination of various aspects of traditional clustering algorithms.


Author(s):  
JUNHAO WEN ◽  
HONGYAN WU ◽  
ZHONGFU WU ◽  
YUANYAN TANG ◽  
GUANGHUI HE

Self-organizing feature maps (SOFM) can learn both the distribution and topology of the input vectors they are trained on. According to this characteristic, we construct neural networks with a family of self-organizing feature maps to cluster the input data space. The proposed algorithm in this paper defines a novel similarity measure, topological similarity, and employs some new concepts, such as SOFM family, UsageFactor. The clustering algorithm handles the clusters with arbitrary shapes and avoid the limitations of the conventional clustering algorithms. We conclude our paper by several experiments with synthetic and standard data set of different characteristics, which show good performance of the proposed algorithm.


2020 ◽  
Vol 10 (5) ◽  
pp. 1891 ◽  
Author(s):  
Huan Niu ◽  
Nasim Khozouie ◽  
Hamid Parvin ◽  
Hamid Alinejad-Rokny ◽  
Amin Beheshti ◽  
...  

Clustering ensemble indicates to an approach in which a number of (usually weak) base clusterings are performed and their consensus clustering is used as the final clustering. Knowing democratic decisions are better than dictatorial decisions, it seems clear and simple that ensemble (here, clustering ensemble) decisions are better than simple model (here, clustering) decisions. But it is not guaranteed that every ensemble is better than a simple model. An ensemble is considered to be a better ensemble if their members are valid or high-quality and if they participate according to their qualities in constructing consensus clustering. In this paper, we propose a clustering ensemble framework that uses a simple clustering algorithm based on kmedoids clustering algorithm. Our simple clustering algorithm guarantees that the discovered clusters are valid. From another point, it is also guaranteed that our clustering ensemble framework uses a mechanism to make use of each discovered cluster according to its quality. To do this mechanism an auxiliary ensemble named reference set is created by running several kmeans clustering algorithms.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


Author(s):  
M. Tanveer ◽  
Tarun Gupta ◽  
Miten Shah ◽  

Twin Support Vector Clustering (TWSVC) is a clustering algorithm inspired by the principles of Twin Support Vector Machine (TWSVM). TWSVC has already outperformed other traditional plane based clustering algorithms. However, TWSVC uses hinge loss, which maximizes shortest distance between clusters and hence suffers from noise-sensitivity and low re-sampling stability. In this article, we propose Pinball loss Twin Support Vector Clustering (pinTSVC) as a clustering algorithm. The proposed pinTSVC model incorporates the pinball loss function in the plane clustering formulation. Pinball loss function introduces favorable properties such as noise-insensitivity and re-sampling stability. The time complexity of the proposed pinTSVC remains equivalent to that of TWSVC. Extensive numerical experiments on noise-corrupted benchmark UCI and artificial datasets have been provided. Results of the proposed pinTSVC model are compared with TWSVC, Twin Bounded Support Vector Clustering (TBSVC) and Fuzzy c-means clustering (FCM). Detailed and exhaustive comparisons demonstrate the better performance and generalization of the proposed pinTSVC for noise-corrupted datasets. Further experiments and analysis on the performance of the above-mentioned clustering algorithms on structural MRI (sMRI) images taken from the ADNI database, face clustering, and facial expression clustering have been done to demonstrate the effectiveness and feasibility of the proposed pinTSVC model.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


Sign in / Sign up

Export Citation Format

Share Document