Comparisons of forecasting for hepatitis in Guangxi Province, China by using three neural networks models

PeerJ ◽

10.7717/peerj.2684 ◽

2016 ◽

Vol 4 ◽

pp. e2684 ◽

Cited By ~ 5

Author(s):

Ruijing Gan ◽

Ni Chen ◽

Daizheng Huang

Keyword(s):

Neural Networks ◽

Back Propagation ◽

Seasonal Fluctuation ◽

Large Data ◽

Small Data ◽

Guangxi Province ◽

Data Set ◽

Generalized Regression Neural Networks ◽

Modeling And Forecasting ◽

Better Than

This study compares and evaluates the prediction of hepatitis in Guangxi Province, China by using back propagation neural networks based genetic algorithm (BPNN-GA), generalized regression neural networks (GRNN), and wavelet neural networks (WNN). In order to compare the results of forecasting, the data obtained from 2004 to 2013 and 2014 were used as modeling and forecasting samples, respectively. The results show that when the small data set of hepatitis has seasonal fluctuation, the prediction result by BPNN-GA will be better than the two other methods. The WNN method is suitable for predicting the large data set of hepatitis that has seasonal fluctuation and the same for the GRNN method when the data increases steadily.

Download Full-text

Some statistical and CI models to predict chaotic high-frequency financial data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189107 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6419-6430

Author(s):

Dusan Marcek

Keyword(s):

Time Series Data ◽

Moving Average ◽

Methodological Approach ◽

Back Propagation ◽

Large Data ◽

Series Data ◽

Data Set ◽

Training Time ◽

Optimal Population ◽

Forecast Time

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

Applying Artificial Neural Networks. I. Estimating Nicotine in Tobacco from near Infrared Data

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.64 ◽

1995 ◽

Vol 3 (3) ◽

pp. 133-142 ◽

Cited By ~ 10

Author(s):

M. Hana ◽

W.F. McClure ◽

T.B. Whitaker ◽

M. White ◽

D.R. Bahler

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Near Infrared ◽

Back Propagation ◽

Linear Network ◽

Data Set ◽

Input Layer ◽

Propagation Network ◽

Better Than

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.

Download Full-text

Extreme Learning Machine with sigmoid activation function on large data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1433.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3523-3526

Keyword(s):

Efficient Algorithm ◽

Large Data ◽

Activation Function ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Learning Machine ◽

Sigmoid Activation Function ◽

State Of Art ◽

Better Than

This paper describes an efficient algorithm for classification in large data set. While many algorithms exist for classification, they are not suitable for larger contents and different data sets. For working with large data sets various ELM algorithms are available in literature. However the existing algorithms using fixed activation function and it may lead deficiency in working with large data. In this paper, we proposed novel ELM comply with sigmoid activation function. The experimental evaluations demonstrate the our ELM-S algorithm is performing better than ELM,SVM and other state of art algorithms on large data sets.

Download Full-text

A Hybrid Method for Prediction and Assessment Efficiency of Decision Making Units

International Journal of Decision Support System Technology ◽

10.4018/jdsst.2013010104 ◽

2013 ◽

Vol 5 (1) ◽

pp. 66-83 ◽

Cited By ~ 1

Author(s):

Iman Rahimi ◽

Reza Behmanesh ◽

Rosnah Mohd. Yusuff

Keyword(s):

Data Mining ◽

Decision Making ◽

Decision Rules ◽

Large Data ◽

Poultry Meat ◽

Small Data ◽

Data Set ◽

Data Mining Techniques ◽

Decision Making Units

The objective of this article is an evaluation and assessment efficiency of the poultry meat farm as a case study with the new method. As it is clear poultry farm industry is one of the most important sub- sectors in comparison to other ones. The purpose of this study is the prediction and assessment efficiency of poultry farms as decision making units (DMUs). Although, several methods have been proposed for solving this problem, the authors strongly need a methodology to discriminate performance powerfully. Their methodology is comprised of data envelopment analysis and some data mining techniques same as artificial neural network (ANN), decision tree (DT), and cluster analysis (CA). As a case study, data for the analysis were collected from 22 poultry companies in Iran. Moreover, due to a small data set and because of the fact that the authors must use large data set for applying data mining techniques, they employed k-fold cross validation method to validate the authors’ model. After assessing efficiency for each DMU and clustering them, followed by applied model and after presenting decision rules, results in precise and accurate optimizing technique.

Download Full-text

The relationship between journal citation impact and citation sentiment: A study of 32 million citances in PubMed Central

Quantitative Science Studies ◽

10.1162/qss_a_00040 ◽

2020 ◽

pp. 1-11

Author(s):

Erjia Yan ◽

Zheng Chen ◽

Kai Li

Keyword(s):

Social Science ◽

Natural Science ◽

Citation Impact ◽

Large Data ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Pubmed Central ◽

Small Data Sets ◽

Sentiment Score

Citation sentiment plays an important role in citation analysis and scholarly communication research, but prior citation sentiment studies have used small data sets and relied largely on manual annotation. This paper uses a large data set of PubMed Central (PMC) full-text publications and analyzes citation sentiment in more than 32 million citances within PMC, revealing citation sentiment patterns at the journal and discipline levels. This paper finds a weak relationship between a journal’s citation impact (as measured by CiteScore) and the average sentiment score of citances to its publications. When journals are aggregated into quartiles based on citation impact, we find that journals in higher quartiles are cited more favorably than those in the lower quartiles. Further, social science journals are found to be cited with higher sentiment, followed by engineering and natural science and biomedical journals, respectively. This result may be attributed to disciplinary discourse patterns in which social science researchers tend to use more subjective terms to describe others’ work than do natural science or biomedical researchers.

Download Full-text

Estimating Intersection Control Delay Using Large Data Sets of Travel Time from a Global Positioning System

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198105191700103 ◽

2005 ◽

Vol 1917 (1) ◽

pp. 18-27

Author(s):

Brian Hoeschen ◽

Darcy Bullock ◽

Mark Schlappi

Keyword(s):

Travel Time ◽

Traffic Engineering ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Control Delay ◽

Diverse Data ◽

Intersection Control ◽

Better Than

Historically, stopped delay was used to characterize the operation of intersection movements because it was relatively easy to measure. During the past decade, the traffic engineering community has moved away from using stopped delay and now uses control delay. That measurement is more precise but quite difficult to extract from large data sets if strict definitions are used to derive the data. This paper evaluates two procedures for estimating control delay. The first is based on a historical approximation that control delay is 30% larger than stopped delay. The second is new and based on segment delay. The procedures are applied to a diverse data set collected in Phoenix, Arizona, and compared with control delay calculated by using the formal definition. The new approximation was observed to be better than the historical stopped delay procedure; it provided an accurate prediction of control delay. Because it is an approximation, this methodology would be most appropriately applied to large data sets collected from travel time studies for ranking and prioritizing intersections for further analysis.

Download Full-text

ASSESSMENT OF WEIGHT CHANGES AND SUGGESTION OF NUTRITIONAL FORMULAS FOR PREMATURELY BORN BABIES WITH CLOSEST REASONABLE CENTROIDS

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s101623721450001x ◽

2014 ◽

Vol 26 (01) ◽

pp. 1450001

Author(s):

Chao-Yi Huang ◽

Jong-Chen Chen

Keyword(s):

Clinical Data ◽

Back Propagation ◽

Health Conditions ◽

Experimental Result ◽

Weight Changes ◽

Data Set ◽

Innovative Method ◽

Nutritional Needs ◽

Effect Relation ◽

Better Than

Recently, many models of applying artificial intelligence (AI) techniques into the analysis of clinical data have been proposed. Unfortunately, most models provide little help when specific "cause–effect" relation of data is not available, or even known. In this paper, an innovative method, called closest reasonable centroids (CRC), is directed to address this issue. Our present application domain was a clinical data set of the weight changes of 274 prematurely born babies who had nutritional deficiency problem and were given TPN treatments to improve their nutritional needs. Experimental result shows that the CRC's differentiability is comparable to that of the back-propagation neural networks (BPN) and better than that of statistical method. Also, from the health conditions of babies and their nutritional treatments, the proposed method can roughly predict their weight changes and provide some suggested feasible formula. All of the above results have been double confirmed by the clinicians, implicating that CRC could be used as assistant tool.

Download Full-text

In SilicoLogPPrediction for a Large Data Set with Support Vector Machines, Radial Basis Neural Networks and Multiple Linear Regression

Chemical Biology & Drug Design ◽

10.1111/j.1747-0285.2009.00840.x ◽

2009 ◽

Vol 74 (2) ◽

pp. 142-147 ◽

Cited By ~ 16

Author(s):

Hai-Feng Chen

Keyword(s):

Neural Networks ◽

Support Vector Machines ◽

Linear Regression ◽

Multiple Linear Regression ◽

Large Data ◽

Support Vector ◽

Data Set ◽

Large Data Set ◽

Radial Basis Neural Networks ◽

Vector Machines

Download Full-text

Methodologies for Imputation of Missing Values in Rice Pest Data

Current Journal of Applied Science and Technology ◽

10.9734/cjast/2021/v40i531304 ◽

2021 ◽

pp. 64-73

Author(s):

V. Jinubala ◽

P. Jeyakumar

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Missing Values ◽

Large Data ◽

Research Field ◽

Data Set ◽

Imputation Methods ◽

Predictive Mean Matching ◽

Rice Pest ◽

Better Than

Data Mining is an emerging research field in the analysis of agricultural data. In fact the most important problem in extracting knowledge from the agriculture data is the missing values of the attributes in the selected data set. If such deficiencies are there in the selected data set then it needs to be cleaned during preprocessing of the data in order to obtain a functional data. The main objective of this paper is to analyse the effectiveness of the various imputation methods in producing a complete data set that can be more useful for applying data mining techniques and presented a comparative analysis of the imputation methods for handling missing values. The pest data set of rice crop collected throughout Maharashtra state under Crop Pest Surveillance and Advisory Project (CROPSAP) during 2009-2013 was used for analysis. The different methodologies like Deleting of rows, Mean & Median, Linear regression and Predictive Mean Matching were analysed for Imputation of Missing values. The comparative analysis shows that Predictive Mean Matching Methodology was better than other methods and effective for imputation of missing values in large data set.

Download Full-text