scholarly journals Unsupervised Offline Changepoint Detection Ensembles

2021 ◽  
Vol 11 (9) ◽  
pp. 4280
Author(s):  
Iurii Katser ◽  
Viacheslav Kozitsin ◽  
Victor Lobachev ◽  
Ivan Maksimov

Offline changepoint detection (CPD) algorithms are used for signal segmentation in an optimal way. Generally, these algorithms are based on the assumption that signal’s changed statistical properties are known, and the appropriate models (metrics, cost functions) for changepoint detection are used. Otherwise, the process of proper model selection can become laborious and time-consuming with uncertain results. Although an ensemble approach is well known for increasing the robustness of the individual algorithms and dealing with mentioned challenges, it is weakly formalized and much less highlighted for CPD problems than for outlier detection or classification problems. This paper proposes an unsupervised CPD ensemble (CPDE) procedure with the pseudocode of the particular proposed ensemble algorithms and the link to their Python realization. The approach’s novelty is in aggregating several cost functions before the changepoint search procedure running during the offline analysis. The numerical experiment showed that the proposed CPDE outperforms non-ensemble CPD procedures. Additionally, we focused on analyzing common CPD algorithms, scaling, and aggregation functions, comparing them during the numerical experiment. The results were obtained on the two anomaly benchmarks that contain industrial faults and failures—Tennessee Eastman Process (TEP) and Skoltech Anomaly Benchmark (SKAB). One of the possible applications of our research is the estimation of the failure time for fault identification and isolation problems of the technical diagnostics.

Author(s):  
Ferdinand Bollwein ◽  
Stephan Westphal

AbstractUnivariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only.


1983 ◽  
Vol 15 (02) ◽  
pp. 331-348
Author(s):  
Wagner De Souza Borges

A large deviation theorem of the Cramér–Petrov type and a ranking limit theorem of Loève are used to derive an approximation for the statisticaldistribution of the failure time of fibrous materials. For that, fibrousmaterials are modeled as a series of independent and identical bundles of parallel filaments and the asymptotic distribution of their failure time is determined in terms of statistical characteristics of the individual filaments, as both the number of filaments in each bundle and the number of bundles in the chain grow large simultaneously. While keeping the numbernof filaments in each bundle fixed and increasing only the chain lengthkleads to a Weibull limiting distribution for the failure time, letting both increase in such a way that logk(n)= o(n), we show that the limit distribution isfor. Since fibrous materials which are both long and have many filaments prevail, the result is of importance in the materials science area since refined approximations to failure-time distributions can be achieved.


Author(s):  
Foued Theljani ◽  
Kaouther Laabidi ◽  
Salah Zidi ◽  
Moufida Ksouri

The support vector domain description (SVDD) is an efficient kernel method inspired from the SV machine (SVM) by Vapnik. It is commonly used for one-classification problems or novelty detection. The training algorithm solves a constrained convex quadratic programming (QP) problem. This assumes prior dense sampling (offline training) and it requires large memory and enormous amounts of training time. In this paper, we propose a fast SVDD dedicated for multiclassification problems. The proposed classifier deals with stationary as well as nonstationary (NS) data. The principle is based on the dynamic removal/insertion of informations according to adequate rules. To ensure the rapidity of convergence, the algorithm considers in each run a limited frame of samples for the training process. These samples are selected according to some approximations based on Karush–Kuhn–Tucker (KKT) conditions. An additional merge mechanism is proposed to avoid local optima drawbacks and improve performances. The developed method is assessed on some synthetic data to prove its effectiveness. Afterward, it is employed to solve a diagnosis problem and faults detection. We considered for this purpose a real industrial plant consisting in Tennessee Eastman process (TEP).


1983 ◽  
Vol 15 (2) ◽  
pp. 331-348 ◽  
Author(s):  
Wagner De Souza Borges

A large deviation theorem of the Cramér–Petrov type and a ranking limit theorem of Loève are used to derive an approximation for the statistical distribution of the failure time of fibrous materials. For that, fibrous materials are modeled as a series of independent and identical bundles of parallel filaments and the asymptotic distribution of their failure time is determined in terms of statistical characteristics of the individual filaments, as both the number of filaments in each bundle and the number of bundles in the chain grow large simultaneously. While keeping the number n of filaments in each bundle fixed and increasing only the chain length k leads to a Weibull limiting distribution for the failure time, letting both increase in such a way that log k(n) = o(n), we show that the limit distribution is for . Since fibrous materials which are both long and have many filaments prevail, the result is of importance in the materials science area since refined approximations to failure-time distributions can be achieved.


Author(s):  
N. E. Gorelova

The article highlights the concept of "time" as the most important axiom which is a part of linguocultural worldview of different nations, that also allows us to consider this concept in terms of its cultural distinctive potential. In its turn, the definition of the limits and possibilities of matching cultures by comparing their lingvocultural axioms enables us to refine the boundaries of their communicative interactions, identifying areas of understanding /misunderstanding and the most constructive interaction /potentialproneness to conflict. Cultural differences in understanding the time could be considered in two main directions: first, each culture performs the calculation of time in its own way; secondly, the way to decompose the time, its "segmentation" also creates a significant originality of unique linguocultural pictures of the world. These features are interrelated and interdependent, therefore it is reasonable to combine them under the title "time attitude." It goes without saying that, the attitude towards time in each linguoculture formed for many centuries, being therefore historically conditioned. In this case, axiological aspect in the attitude towards time should be mentioned, which is also associated with the reflection in this concept of the socially- approved values of different levels (ethical, aesthetic, religious, etc.). Consequently, we are talking about a concentrated expression of the important world paradigms in the concept of “time", which are related to value-sense orientation and social psychology segments, that are projected on the individual consciousness in the process of enculturation. The specificity of the Japanese culture is usually described within the overall paradigm of the relations between East and West, considered as a synthetic and analytical way of perceiving the world. These relations - are relations between two independent, different systems of traditions, meanings and world-views. The question of their commensuration is one of the key questions for the theory and practice ofintercultural communication, considering the influence of cultural differences on the run of the communicative event, its "success" or "failure." "Time attitude" is the most important linguocultural "axiom", allowing not only to recognize the specificity of the world picture of Japanese culture representatives, but also to highlight the clear criteria for its possible comparison with the counter-agents of the communication, to describe the "intersection point" and "point of mutual exclusionemerging form comparison of Japanese and Russian linguocultures, their peculiar" time attitude."


Author(s):  
Artittayapron Rojarath ◽  
Wararat Songpan

AbstractEnsemble learning is an algorithm that utilizes various types of classification models. This algorithm can enhance the prediction efficiency of component models. However, the efficiency of combining models typically depends on the diversity and accuracy of the predicted results of ensemble models. However, the problem of multi-class data is still encountered. In the proposed approach, cost-sensitive learning was implemented to evaluate the prediction accuracy for each class, which was used to construct a cost-sensitivity matrix of the true positive (TP) rate. This TP rate can be used as a weight value and combined with a probability value to drive ensemble learning for a specified class. We proposed an ensemble model, which was a type of heterogenous model, namely, a combination of various individual classification models (support vector machine, Bayes, K-nearest neighbour, naïve Bayes, decision tree, and multi-layer perceptron) in experiments on 3-, 4-, 5- and 6-classifier models. The efficiencies of the propose models were compared to those of the individual classifier model and homogenous models (Adaboost, bagging, stacking, voting, random forest, and random subspaces) with various multi-class data sets. The experimental results demonstrate that the cost-sensitive probability for the weighted voting ensemble model that was derived from 3 models provided the most accurate results for the dataset in multi-class prediction. The objective of this study was to increase the efficiency of predicting classification results in multi-class classification tasks and to improve the classification results.


Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.


Author(s):  
Nataliya Weselovska ◽  
Sergey Shargorodskiy

The use of new energy-saving technologies has led to the significant development of vibration machine designs and their widespread use. In the course of their operation, the question of efficiency and reliability of the use of this type of machines is rather acute, related to the availability and possibility of using reserves of its operation. Machines of this type must meet the requirements of quality and reliability in order to fulfill their official purpose. Due to the design features and the complexity of the processes occurring during their work, analytical calculations of durability and reliability in the classical version are quite approximate in nature and do not provide the necessary accuracy. So the question of reliability and durability of vibrating equipment is urgent. The questions of reliability estimation of vibrating machines were dealt with by the following scientists: Iskovich-Lototsky RD, Obertyukh R., Sevastyanov IV, Kanarchuk VE, Dzhratratano D.J. etc. The techniques they offer are virtually indistinguishable from those adopted in general engineering. The publication proposes a technique for evaluating efficiency and reliability based on the use of quantitative characteristics of probabilistic and statistical nature. Such indicators for quantifying the reliability and reliability are: probability of failure, failure rate, failure rate, failure time. These indicators are one of the most important in the technical diagnostics of the operation of vibrating machines and the estimation of their residual life. The basic calculation, dependencies, analysis of laws are given.


2012 ◽  
Vol 8 (1) ◽  
pp. 1-23 ◽  
Author(s):  
Philicity K. Williams ◽  
Caio V. Soares ◽  
Juan E. Gilbert

Predictive models, such as rule based classifiers, often have difficulty with incomplete data (e.g., erroneous/missing values). So, this work presents a technique used to reduce the severity of the effects of missing data on the performance of rule base classifiers using divisive data clustering. The Clustering Rule based Approach (CRA) clusters the original training data and builds a separate rule based model on the cluster wise data. The individual models are combined into a larger model and evaluated against test data. The effects of the missing attribute information for ordered and unordered rule sets is evaluated and the collective model (CRA) is experimentally used to show that its performance is less affected than the traditional model when the test data has missing attribute values, thus making it more resilient and robust to missing data.


Sign in / Sign up

Export Citation Format

Share Document