scholarly journals Data Preprocessing Method and Fault Diagnosis Based on Evaluation Function of Information Contribution Degree

2018 ◽  
Vol 2018 ◽  
pp. 1-10
Author(s):  
Siyu Ji ◽  
Chenglin Wen

Neural network is a data-driven algorithm; the process established by the network model requires a large amount of training data, resulting in a significant amount of time spent in parameter training of the model. However, the system modal update occurs from time to time. Prediction using the original model parameters will cause the output of the model to deviate greatly from the true value. Traditional methods such as gradient descent and least squares methods are all centralized, making it difficult to adaptively update model parameters according to system changes. Firstly, in order to adaptively update the network parameters, this paper introduces the evaluation function and gives a new method to evaluate the parameters of the function. The new method without changing other parameters of the model updates some parameters in the model in real time to ensure the accuracy of the model. Then, based on the evaluation function, the Mean Impact Value (MIV) algorithm is used to calculate the weight of the feature, and the weighted data is brought into the established fault diagnosis model for fault diagnosis. Finally, the validity of this algorithm is verified by the example of UCI-Combined Cycle Power Plant (UCI-ccpp) simulation of standard data set.

2018 ◽  
Vol 7 (04) ◽  
pp. 871-888 ◽  
Author(s):  
Sophie J. Lee ◽  
Howard Liu ◽  
Michael D. Ward

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.


2021 ◽  
Vol 16 (2) ◽  
pp. 145-160
Author(s):  
N. Agarwal ◽  
N. Shrivastava ◽  
M.K. Pradhan

Advanced modeling and optimization techniques are imperative today to deal with complex machining processes like electric discharge machining (EDM). In the present research, Titanium alloy has been machined by considering different electrical input parameters to evaluate one of the important surface integrity (SI) parameter that is surface roughness Ra. Firstly, the response surface methodology (RSM) has been adopted for experimental design and for generating training data set. The artificial neural network (ANN) model has been developed and optimized for Ra with the same training data set. Finally, an adaptive neuro-fuzzy inference system (ANFIS) model has been developed for Ra. Optimization of the developed ANFIS model has been done by applying the latest optimization techniques Rao algorithm and the Jaya algorithm. Different statistical parameters such as the mean square error (MSE), the mean absolute error (MAE), the root mean square error (RMSE), the mean bias error (MBE) and the mean absolute percentage error (MAPE) elucidate that the ANFIS model is better than the ANN model. Both the optimization algorithms results in considerable improvement in the SI of the machined surface. Comparing the Rao algorithm and Jaya algorithm for optimization, it has been found that the Rao algorithm performs better than the Jaya algorithm.


2021 ◽  
Author(s):  
Yuqi Wang ◽  
Tianyuan Liu ◽  
Di Zhang

Abstract The research on the supercritical carbon dioxide (S-CO2) Brayton cycle has gradually become a hot spot in recent years. The off-design performance of turbine is an important reference for analyzing the variable operating conditions of the cycle. With the development of deep learning technology, the research of surrogate models based on neural network has received extensive attention. In order to improve the inefficiency in traditional off-design analyses, this research establishes a data-driven deep learning off-design aerodynamic prediction model for a S-CO2 centrifugal turbine, which is based on a deep convolutional neural network. The network can rapidly and adaptively provide dynamic aerodynamic performance prediction results for varying blade profiles and operating conditions. Meanwhile, it can illustrate the mechanism based on the field reconstruction results for the generated aerodynamic performance. The training results show that the off-design aerodynamic prediction convolutional neural network (OAP-CNN) has reduced the mean and maximum error of efficiency prediction compared with the traditional Gaussian Process Regression (GPR) and Artificial Neural Network (ANN). Aiming at the off-design conditions, the pressure and temperature distributions with acceptable error can be obtained without a CFD calculation. Besides, the influence of off-design parameters on the efficiency and power can be conveniently acquired, thus providing the reference for an optimized operation strategy. Analyzing the sensitivity of AOP-CNN to training data set size, the prediction accuracy is acceptable when the percentage of training samples exceeds 50%. The minimum error appears when the training data set size is 0.8. The mean and maximum errors are respectively 1.46% and 6.42%. In summary, this research provides a precise and fast aerodynamic performance prediction model in the analyses of off-design conditions for S-CO2 turbomachinery and Brayton cycle.


Author(s):  
WENTAO MAO ◽  
JIUCHENG XU ◽  
SHENGJIE ZHAO ◽  
MEI TIAN

Recently, extreme learning machines (ELMs) have been a promising tool in solving a wide range of regression and classification applications. However, when modeling multiple related tasks in which only limited training data per task are available and the dimension is low, ELMs are generally hard to get impressive performance due to little help from the informative domain knowledge across tasks. To solve this problem, this paper extends ELM to the scenario of multi-task learning (MTL). First, based on the assumption that model parameters of related tasks are close to each other, a new regularization-based MTL algorithm for ELM is proposed to learn related tasks jointly via simple matrix inversion. For improving the learning performance, the algorithm proposed above is further formulated as a mixed integer programming in order to identify the grouping structure in which parameters are closer than others, and finally an alternating minimization method is presented to solve this optimization. Experiments conducted on a toy problem as well as real-life data set demonstrate the effectiveness of the proposed MTL algorithm compared to the classical ELM and the standard MTL algorithm.


2014 ◽  
Vol 977 ◽  
pp. 349-352 ◽  
Author(s):  
Gang Yu ◽  
Jian Kang

As one of the most important type of machinery, rotating machinery may malfunction due to various reasons. Sometimes the fault is a single one, but sometimes it maybe in multi-fault condition, this paper mainly focus on the latter. First, the paper gives a brief introduction of the study on multi-fault, then it introduces the mixture of Alpha stable distribution model, besides, it gives the model parameters estimation algorithm in detail, at last we use the SOM net to complete pattern recognition. The results prove that this modeling method is effective in multi-fault diagnosis in rotating machinery.


2019 ◽  
Vol 25 (5) ◽  
pp. 651-674 ◽  
Author(s):  
Katja Zupan ◽  
Nikola Ljubešić ◽  
Tomaž Erjavec

AbstractPart-of-speech (PoS) tagging of non-standard language with models developed for standard language is known to suffer from a significant decrease in accuracy. Two methods are typically used to improve it: word normalisation, which decreases the out-of-vocabulary rate of the PoS tagger, and domain adaptation where the tagger is made aware of the non-standard language variation, either through supervision via non-standard data being added to the tagger’s training set, or via distributional information calculated from raw texts. This paper investigates the two approaches, normalisation and domain adaptation, on carefully constructed data sets encompassing historical and user-generated Slovene texts, in particular focusing on the amount of labour necessary to produce the manually annotated data sets for each approach and comparing the resulting PoS accuracy. We give quantitative as well as qualitative analyses of the tagger performance in various settings, showing that on our data set closed and open class words exhibit significantly different behaviours, and that even small inconsistencies in the PoS tags in the data have an impact on the accuracy. We also show that to improve tagging accuracy, it is best to concentrate on obtaining manually annotated normalisation training data for short annotation campaigns, while manually producing in-domain training sets for PoS tagging is better when a more substantial annotation campaign can be undertaken. Finally, unsupervised adaptation via Brown clustering is similarly useful regardless of the size of the training data available, but improvements tend to be bigger when adaptation is performed via in-domain tagging data.


2012 ◽  
Vol 66 (2) ◽  
pp. 239-246
Author(s):  
Xu Hua ◽  
Xue Hengxin ◽  
Chen Zhiguo

In order to overcome the shortcoming of the solution may be trapped into the local minimization in the traditional TSK (Takagi-Sugeno-Kang) fuzzy inference training, this paper attempts to consider the TSK fuzzy system modeling approach based on the visual system principle and the Weber law. This approach not only utilizes the strong capability of identifying objects of human eyes, but also considers the distribution structure of the training data set in parameter regulation. In order to overcome the shortcoming of it adopting the gradient learning algorithm with slow convergence rate, a novel visual TSK fuzzy system model based on evolutional learning is proposed by introducing the particle swarm optimization algorithm. The main advantage of this method lies in its very good optimization, very strong noise immunity and very good interpretability. The new method is applied to long-term hydrological forecasting examples. The simulation results show that the method is feasibile and effective, the new method not only inherits the advantages of traditional visual TSK fuzzy models but also has the better global convergence and accuracy than the traditional model.


2020 ◽  
Vol 36 (1) ◽  
pp. 89-115 ◽  
Author(s):  
Harvey Goldstein ◽  
Natalie Shlomo

AbstractThe requirement to anonymise data sets that are to be released for secondary analysis should be balanced by the need to allow their analysis to provide efficient and consistent parameter estimates. The proposal in this article is to integrate the process of anonymisation and data analysis. The first stage uses the addition of random noise with known distributional properties to some or all variables in a released (already pseudonymised) data set, in which the values of some identifying and sensitive variables for data subjects of interest are also available to an external ‘attacker’ who wishes to identify those data subjects in order to interrogate their records in the data set. The second stage of the analysis consists of specifying the model of interest so that parameter estimation accounts for the added noise. Where the characteristics of the noise are made available to the analyst by the data provider, we propose a new method that allows a valid analysis. This is formally a measurement error model and we describe a Bayesian MCMC algorithm that recovers consistent estimates of the true model parameters. A new method for handling categorical data is presented. The article shows how an appropriate noise distribution can be determined.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Haifan Du ◽  
Haiwen Duan

This paper combines domestic and international research results to analyze and study the difference between the attribute features of English phrase speech and noise to enhance the short-time energy, which is used to improve the threshold judgment sensitivity; noise addition to the discrepancy data set is used to enhance the recognition robustness. The backpropagation algorithm is improved to constrain the range of weight variation, avoid oscillation phenomenon, and shorten the training time. In the real English phrase sound recognition system, there are problems such as massive training data and low training efficiency caused by the super large-scale model parameters of the convolutional neural network. To address these problems, the NWBP algorithm is based on the oscillation phenomenon that tends to occur when searching for the minimum error value in the late training period of the network parameters, using the K-MEANS algorithm to obtain the seed nodes that approach the minimal error value, and using the boundary value rule to reduce the range of weight change to reduce the oscillation phenomenon so that the network error converges as soon as possible and improve the training efficiency. Through simulation experiments, the NWBP algorithm improves the degree of fitting and convergence speed in the training of complex convolutional neural networks compared with other algorithms, reduces the redundant computation, and shortens the training time to a certain extent, and the algorithm has the advantage of accelerating the convergence of the network compared with simple networks. The word tree constraint and its efficient storage structure are introduced, which improves the storage efficiency of the word tree constraint and the retrieval efficiency in the English phrase recognition search.


2005 ◽  
Vol 23 (9) ◽  
pp. 2969-2974 ◽  
Author(s):  
N. Srivastava

Abstract. A logistic regression model is implemented for predicting the occurrence of intense/super-intense geomagnetic storms. A binary dependent variable, indicating the occurrence of intense/super-intense geomagnetic storms, is regressed against a series of independent model variables that define a number of solar and interplanetary properties of geo-effective CMEs. The model parameters (regression coefficients) are estimated from a training data set which was extracted from a dataset of 64 geo-effective CMEs observed during 1996-2002. The trained model is validated by predicting the occurrence of geomagnetic storms from a validation dataset, also extracted from the same data set of 64 geo-effective CMEs, recorded during 1996-2002, but not used for training the model. The model predicts 78% of the geomagnetic storms from the validation data set. In addition, the model predicts 85% of the geomagnetic storms from the training data set. These results indicate that logistic regression models can be effectively used for predicting the occurrence of intense geomagnetic storms from a set of solar and interplanetary factors.


Sign in / Sign up

Export Citation Format

Share Document