Data Preprocessing Method and Fault Diagnosis Based on Evaluation Function of Information Contribution Degree

Journal of Control Science and Engineering ◽

10.1155/2018/6565737 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10

Author(s):

Siyu Ji ◽

Chenglin Wen

Keyword(s):

Fault Diagnosis ◽

Combined Cycle ◽

Training Data ◽

New Method ◽

Model Parameters ◽

Evaluation Function ◽

Data Set ◽

Standard Data ◽

True Value ◽

The Mean

Neural network is a data-driven algorithm; the process established by the network model requires a large amount of training data, resulting in a significant amount of time spent in parameter training of the model. However, the system modal update occurs from time to time. Prediction using the original model parameters will cause the output of the model to deviate greatly from the true value. Traditional methods such as gradient descent and least squares methods are all centralized, making it difficult to adaptively update model parameters according to system changes. Firstly, in order to adaptively update the network parameters, this paper introduces the evaluation function and gives a new method to evaluate the parameters of the function. The new method without changing other parameters of the model updates some parameters in the model in real time to ensure the accuracy of the model. Then, based on the evaluation function, the Mean Impact Value (MIV) algorithm is used to calculate the weight of the feature, and the weighted data is brought into the established fault diagnosis model for fault diagnosis. Finally, the validity of this algorithm is verified by the example of UCI-Combined Cycle Power Plant (UCI-ccpp) simulation of standard data set.

Download Full-text

Lost in Space: Geolocation in Event Data

Political Science Research and Methods ◽

10.1017/psrm.2018.23 ◽

2018 ◽

Vol 7 (04) ◽

pp. 871-888 ◽

Cited By ~ 6

Author(s):

Sophie J. Lee ◽

Howard Liu ◽

Michael D. Ward

Keyword(s):

Learning Algorithm ◽

Text Processing ◽

Contextual Information ◽

Training Data ◽

Supervised Machine Learning ◽

Model Parameters ◽

Event Data ◽

Data Set ◽

N Gram ◽

Automated Text Processing

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.

Download Full-text

Hybrid ANFIS-Rao algorithm for surface roughness modelling and optimization in electrical discharge machining

Advances in Production Engineering & Management ◽

10.14743/apem2021.2.390 ◽

2021 ◽

Vol 16 (2) ◽

pp. 145-160

Author(s):

N. Agarwal ◽

N. Shrivastava ◽

M.K. Pradhan

Keyword(s):

Surface Roughness ◽

Optimization Techniques ◽

Training Data ◽

Mean Square ◽

Jaya Algorithm ◽

Ann Model ◽

Anfis Model ◽

Data Set ◽

The Mean ◽

Better Than

Advanced modeling and optimization techniques are imperative today to deal with complex machining processes like electric discharge machining (EDM). In the present research, Titanium alloy has been machined by considering different electrical input parameters to evaluate one of the important surface integrity (SI) parameter that is surface roughness Ra. Firstly, the response surface methodology (RSM) has been adopted for experimental design and for generating training data set. The artificial neural network (ANN) model has been developed and optimized for Ra with the same training data set. Finally, an adaptive neuro-fuzzy inference system (ANFIS) model has been developed for Ra. Optimization of the developed ANFIS model has been done by applying the latest optimization techniques Rao algorithm and the Jaya algorithm. Different statistical parameters such as the mean square error (MSE), the mean absolute error (MAE), the root mean square error (RMSE), the mean bias error (MBE) and the mean absolute percentage error (MAPE) elucidate that the ANFIS model is better than the ANN model. Both the optimization algorithms results in considerable improvement in the SI of the machined surface. Comparing the Rao algorithm and Jaya algorithm for optimization, it has been found that the Rao algorithm performs better than the Jaya algorithm.

Download Full-text

Aerodynamic Prediction on the Off-Design Performance of a S-CO2 Turbine Based on Deep Learning

10.1115/gt2021-60056 ◽

2021 ◽

Author(s):

Yuqi Wang ◽

Tianyuan Liu ◽

Di Zhang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Performance Prediction ◽

Aerodynamic Performance ◽

Operating Conditions ◽

Training Data ◽

Brayton Cycle ◽

Data Set ◽

Set Size ◽

The Mean

Abstract The research on the supercritical carbon dioxide (S-CO2) Brayton cycle has gradually become a hot spot in recent years. The off-design performance of turbine is an important reference for analyzing the variable operating conditions of the cycle. With the development of deep learning technology, the research of surrogate models based on neural network has received extensive attention. In order to improve the inefficiency in traditional off-design analyses, this research establishes a data-driven deep learning off-design aerodynamic prediction model for a S-CO2 centrifugal turbine, which is based on a deep convolutional neural network. The network can rapidly and adaptively provide dynamic aerodynamic performance prediction results for varying blade profiles and operating conditions. Meanwhile, it can illustrate the mechanism based on the field reconstruction results for the generated aerodynamic performance. The training results show that the off-design aerodynamic prediction convolutional neural network (OAP-CNN) has reduced the mean and maximum error of efficiency prediction compared with the traditional Gaussian Process Regression (GPR) and Artificial Neural Network (ANN). Aiming at the off-design conditions, the pressure and temperature distributions with acceptable error can be obtained without a CFD calculation. Besides, the influence of off-design parameters on the efficiency and power can be conveniently acquired, thus providing the reference for an optimized operation strategy. Analyzing the sensitivity of AOP-CNN to training data set size, the prediction accuracy is acceptable when the percentage of training samples exceeds 50%. The minimum error appears when the training data set size is 0.8. The mean and maximum errors are respectively 1.46% and 6.42%. In summary, this research provides a precise and fast aerodynamic performance prediction model in the analyses of off-design conditions for S-CO2 turbomachinery and Brayton cycle.

Download Full-text

RESEARCH OF MULTI-TASK LEARNING BASED ON EXTREME LEARNING MACHINE

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488513400175 ◽

2013 ◽

Vol 21 (supp02) ◽

pp. 75-85 ◽

Cited By ~ 2

Author(s):

WENTAO MAO ◽

JIUCHENG XU ◽

SHENGJIE ZHAO ◽

MEI TIAN

Keyword(s):

Domain Knowledge ◽

Real Life ◽

Training Data ◽

Learning Performance ◽

Mixed Integer ◽

Model Parameters ◽

Minimization Method ◽

Data Set ◽

Task Learning ◽

Wide Range

Recently, extreme learning machines (ELMs) have been a promising tool in solving a wide range of regression and classification applications. However, when modeling multiple related tasks in which only limited training data per task are available and the dimension is low, ELMs are generally hard to get impressive performance due to little help from the informative domain knowledge across tasks. To solve this problem, this paper extends ELM to the scenario of multi-task learning (MTL). First, based on the assumption that model parameters of related tasks are close to each other, a new regularization-based MTL algorithm for ELM is proposed to learn related tasks jointly via simple matrix inversion. For improving the learning performance, the algorithm proposed above is further formulated as a mixed integer programming in order to identify the grouping structure in which parameters are closer than others, and finally an alternating minimization method is presented to solve this optimization. Experiments conducted on a toy problem as well as real-life data set demonstrate the effectiveness of the proposed MTL algorithm compared to the classical ELM and the standard MTL algorithm.

Download Full-text

A New Method for Multi-Fault Diagnosis of Rotating Machinery Based on the Mixture Alpha Stable Distribution Model

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.977.349 ◽

2014 ◽

Vol 977 ◽

pp. 349-352 ◽

Cited By ~ 1

Author(s):

Gang Yu ◽

Jian Kang

Keyword(s):

Pattern Recognition ◽

Fault Diagnosis ◽

Stable Distribution ◽

Estimation Algorithm ◽

Rotating Machinery ◽

Parameters Estimation ◽

New Method ◽

Distribution Model ◽

Model Parameters ◽

Alpha Stable Distribution

As one of the most important type of machinery, rotating machinery may malfunction due to various reasons. Sometimes the fault is a single one, but sometimes it maybe in multi-fault condition, this paper mainly focus on the latter. First, the paper gives a brief introduction of the study on multi-fault, then it introduces the mixture of Alpha stable distribution model, besides, it gives the model parameters estimation algorithm in detail, at last we use the SOM net to complete pattern recognition. The results prove that this modeling method is effective in multi-fault diagnosis in rotating machinery.

Download Full-text

How to tag non-standard language: Normalisation versus domain adaptation for Slovene historical and user-generated texts

Natural Language Engineering ◽

10.1017/s1351324919000366 ◽

2019 ◽

Vol 25 (5) ◽

pp. 651-674 ◽

Cited By ~ 1

Author(s):

Katja Zupan ◽

Nikola Ljubešić ◽

Tomaž Erjavec

Keyword(s):

Domain Adaptation ◽

Training Data ◽

Data Sets ◽

Standard Language ◽

Data Set ◽

Standard Data ◽

Pos Tagging ◽

Part Of Speech ◽

Versus Domain ◽

Pos Tagger

AbstractPart-of-speech (PoS) tagging of non-standard language with models developed for standard language is known to suffer from a significant decrease in accuracy. Two methods are typically used to improve it: word normalisation, which decreases the out-of-vocabulary rate of the PoS tagger, and domain adaptation where the tagger is made aware of the non-standard language variation, either through supervision via non-standard data being added to the tagger’s training set, or via distributional information calculated from raw texts. This paper investigates the two approaches, normalisation and domain adaptation, on carefully constructed data sets encompassing historical and user-generated Slovene texts, in particular focusing on the amount of labour necessary to produce the manually annotated data sets for each approach and comparing the resulting PoS accuracy. We give quantitative as well as qualitative analyses of the tagger performance in various settings, showing that on our data set closed and open class words exhibit significantly different behaviours, and that even small inconsistencies in the PoS tags in the data have an impact on the accuracy. We also show that to improve tagging accuracy, it is best to concentrate on obtaining manually annotated normalisation training data for short annotation campaigns, while manually producing in-domain training sets for PoS tagging is better when a more substantial annotation campaign can be undertaken. Finally, unsupervised adaptation via Brown clustering is similarly useful regardless of the size of the training data available, but improvements tend to be bigger when adaptation is performed via in-domain tagging data.

Download Full-text

Application of hydrologic forecast model

Water Science & Technology ◽

10.2166/wst.2012.161 ◽

2012 ◽

Vol 66 (2) ◽

pp. 239-246

Author(s):

Xu Hua ◽

Xue Hengxin ◽

Chen Zhiguo

Keyword(s):

Fuzzy System ◽

Fuzzy Inference ◽

Learning Algorithm ◽

System Modeling ◽

Forecast Model ◽

Training Data ◽

New Method ◽

Data Set ◽

Gradient Learning ◽

Takagi Sugeno

In order to overcome the shortcoming of the solution may be trapped into the local minimization in the traditional TSK (Takagi-Sugeno-Kang) fuzzy inference training, this paper attempts to consider the TSK fuzzy system modeling approach based on the visual system principle and the Weber law. This approach not only utilizes the strong capability of identifying objects of human eyes, but also considers the distribution structure of the training data set in parameter regulation. In order to overcome the shortcoming of it adopting the gradient learning algorithm with slow convergence rate, a novel visual TSK fuzzy system model based on evolutional learning is proposed by introducing the particle swarm optimization algorithm. The main advantage of this method lies in its very good optimization, very strong noise immunity and very good interpretability. The new method is applied to long-term hydrological forecasting examples. The simulation results show that the method is feasibile and effective, the new method not only inherits the advantages of traditional visual TSK fuzzy models but also has the better global convergence and accuracy than the traditional model.

Download Full-text

A Probabilistic Procedure for Anonymisation, for Assessing the Risk of Re-identification and for the Analysis of Perturbed Data Sets

Journal of Official Statistics ◽

10.2478/jos-2020-0005 ◽

2020 ◽

Vol 36 (1) ◽

pp. 89-115 ◽

Cited By ~ 1

Author(s):

Harvey Goldstein ◽

Natalie Shlomo

Keyword(s):

Random Noise ◽

Secondary Analysis ◽

New Method ◽

Parameter Estimates ◽

Model Parameters ◽

Data Sets ◽

Data Set ◽

Second Stage ◽

True Model ◽

Consistent Parameter

AbstractThe requirement to anonymise data sets that are to be released for secondary analysis should be balanced by the need to allow their analysis to provide efficient and consistent parameter estimates. The proposal in this article is to integrate the process of anonymisation and data analysis. The first stage uses the addition of random noise with known distributional properties to some or all variables in a released (already pseudonymised) data set, in which the values of some identifying and sensitive variables for data subjects of interest are also available to an external ‘attacker’ who wishes to identify those data subjects in order to interrogate their records in the data set. The second stage of the analysis consists of specifying the model of interest so that parameter estimation accounts for the added noise. Where the characteristics of the noise are made available to the analyst by the data provider, we propose a new method that allows a valid analysis. This is formally a measurement error model and we describe a Bayesian MCMC algorithm that recovers consistent estimates of the true model parameters. A new method for handling categorical data is presented. The article shows how an appropriate noise distribution can be determined.

Download Full-text

English Phrase Speech Recognition Based on Continuous Speech Recognition Algorithm and Word Tree Constraints

Complexity ◽

10.1155/2021/8482379 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Haifan Du ◽

Haiwen Duan

Keyword(s):

Speech Recognition ◽

Training Data ◽

Scale Model ◽

Sound Recognition ◽

Model Parameters ◽

Data Set ◽

Training Time ◽

Weight Variation ◽

Oscillation Phenomenon ◽

Training Efficiency

This paper combines domestic and international research results to analyze and study the difference between the attribute features of English phrase speech and noise to enhance the short-time energy, which is used to improve the threshold judgment sensitivity; noise addition to the discrepancy data set is used to enhance the recognition robustness. The backpropagation algorithm is improved to constrain the range of weight variation, avoid oscillation phenomenon, and shorten the training time. In the real English phrase sound recognition system, there are problems such as massive training data and low training efficiency caused by the super large-scale model parameters of the convolutional neural network. To address these problems, the NWBP algorithm is based on the oscillation phenomenon that tends to occur when searching for the minimum error value in the late training period of the network parameters, using the K-MEANS algorithm to obtain the seed nodes that approach the minimal error value, and using the boundary value rule to reduce the range of weight change to reduce the oscillation phenomenon so that the network error converges as soon as possible and improve the training efficiency. Through simulation experiments, the NWBP algorithm improves the degree of fitting and convergence speed in the training of complex convolutional neural networks compared with other algorithms, reduces the redundant computation, and shortens the training time to a certain extent, and the algorithm has the advantage of accelerating the convergence of the network compared with simple networks. The word tree constraint and its efficient storage structure are introduced, which improves the storage efficiency of the word tree constraint and the retrieval efficiency in the English phrase recognition search.

Download Full-text

A logistic regression model for predicting the occurrence of intense geomagnetic storms

Annales Geophysicae ◽

10.5194/angeo-23-2969-2005 ◽

2005 ◽

Vol 23 (9) ◽

pp. 2969-2974 ◽

Cited By ~ 22

Author(s):

N. Srivastava

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Geomagnetic Storms ◽

Training Data ◽

Validation Dataset ◽

Model Parameters ◽

Validation Data ◽

Data Set ◽

Logistic Regression Models

Abstract. A logistic regression model is implemented for predicting the occurrence of intense/super-intense geomagnetic storms. A binary dependent variable, indicating the occurrence of intense/super-intense geomagnetic storms, is regressed against a series of independent model variables that define a number of solar and interplanetary properties of geo-effective CMEs. The model parameters (regression coefficients) are estimated from a training data set which was extracted from a dataset of 64 geo-effective CMEs observed during 1996-2002. The trained model is validated by predicting the occurrence of geomagnetic storms from a validation dataset, also extracted from the same data set of 64 geo-effective CMEs, recorded during 1996-2002, but not used for training the model. The model predicts 78% of the geomagnetic storms from the validation data set. In addition, the model predicts 85% of the geomagnetic storms from the training data set. These results indicate that logistic regression models can be effectively used for predicting the occurrence of intense geomagnetic storms from a set of solar and interplanetary factors.

Download Full-text