A WEIGHTED COMBINER OF STACKING BASED METHODS

NIALL ROONEY

doi:10.1142/s0218213012500406

A WEIGHTED COMBINER OF STACKING BASED METHODS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213012500406 ◽

2012 ◽

Vol 21 (06) ◽

pp. 1250040

Author(s):

NIALL ROONEY

Keyword(s):

Empirical Analysis ◽

Data Sets ◽

Computational Overhead ◽

Novel Method ◽

Regression Problems ◽

Weighted Combination

In this paper we present a novel method that forms a weighted combination of a range of Stacking based methods for regression problems, without adding any major computational overhead in comparison to stacking itself. The intention of the technique is to benefit from the variation in performance of individual Stacking methods as demonstrated with different data sets, in order to provide a more robust technique overall. We detail an empirical analysis of the technique referred to as weighted Meta–Combiner (wMetaComb) and compare its performance to its underlying techniques.

Download Full-text

Suspended Sediment Modeling Using a Heuristic Regression Method Hybridized with Kmeans Clustering

Sustainability ◽

10.3390/su13094648 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4648

Author(s):

Rana Muhammad Adnan ◽

Kulwinder Singh Parmar ◽

Salim Heddam ◽

Shamsuddin Shahid ◽

Ozgur Kisi

Keyword(s):

Clustering Algorithm ◽

Suspended Sediments ◽

Ecological Impacts ◽

Accurate Estimation ◽

Data Sets ◽

The Novel ◽

Neuro Fuzzy ◽

Adaptive Regression ◽

Novel Method ◽

Model Training

The accurate estimation of suspended sediments (SSs) carries significance in determining the volume of dam storage, river carrying capacity, pollution susceptibility, soil erosion potential, aquatic ecological impacts, and the design and operation of hydraulic structures. The presented study proposes a new method for accurately estimating daily SSs using antecedent discharge and sediment information. The novel method is developed by hybridizing the multivariate adaptive regression spline (MARS) and the Kmeans clustering algorithm (MARS–KM). The proposed method’s efficacy is established by comparing its performance with the adaptive neuro-fuzzy system (ANFIS), MARS, and M5 tree (M5Tree) models in predicting SSs at two stations situated on the Yangtze River of China, according to the three assessment measurements, RMSE, MAE, and NSE. Two modeling scenarios are employed; data are divided into 50–50% for model training and testing in the first scenario, and the training and test data sets are swapped in the second scenario. In Guangyuan Station, the MARS–KM showed a performance improvement compared to ANFIS, MARS, and M5Tree methods in term of RMSE by 39%, 30%, and 18% in the first scenario and by 24%, 22%, and 8% in the second scenario, respectively, while the improvement in RMSE of ANFIS, MARS, and M5Tree was 34%, 26%, and 27% in the first scenario and 7%, 16%, and 6% in the second scenario, respectively, at Beibei Station. Additionally, the MARS–KM models provided much more satisfactory estimates using only discharge values as inputs.

Download Full-text

Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00032 ◽

2018 ◽

Vol 6 ◽

pp. 451-465 ◽

Cited By ~ 5

Author(s):

Daniela Gerz ◽

Ivan Vulić ◽

Edoardo Ponti ◽

Jason Naradowsky ◽

Roi Reichart ◽

...

Keyword(s):

Large Scale ◽

Language Modeling ◽

Language Models ◽

Data Sets ◽

High Type ◽

Word Level ◽

Level Information ◽

Character Sequences ◽

Novel Method ◽

Morphologically Rich Languages

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.

Download Full-text

Novel method for bearing performance degradation assessment – A kernel locality preserving projection-based approach

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406213486735 ◽

2013 ◽

Vol 228 (3) ◽

pp. 548-560 ◽

Cited By ~ 18

Author(s):

Chuang Sun ◽

Zhousuo Zhang ◽

Zhengjia He ◽

Zhongjie Shen ◽

Binqiang Chen ◽

...

Keyword(s):

Test Data ◽

Performance Degradation ◽

Inner Product ◽

Data Sets ◽

Degradation Index ◽

Locality Preserving Projection ◽

Locality Preserving ◽

Non Linear ◽

Linear Information ◽

Novel Method

Bearing performance degradation assessment is meaningful for keeping mechanical reliability and safety. For this purpose, a novel method based on kernel locality preserving projection is proposed in this article. Kernel locality preserving projection extends the traditional locality preserving projection into the non-linear form by using a kernel function and it is more appropriate to explore the non-linear information hidden in the data sets. Considering this point, the kernel locality preserving projection is used to generate a non-linear subspace from the normal bearing data. The test data are then projected onto the subspace to obtain an index for assessing bearing degradation degrees. The degradation index that is expressed in the form of inner product indicates similarity of the normal data and the test data. Validations by using monitoring data from two experiments show the effectiveness of the proposed method.

Download Full-text

Variational inference using approximate likelihood under the coalescent with recombination

Genome Research ◽

10.1101/gr.273631.120 ◽

2021 ◽

pp. gr.273631.120

Author(s):

Xinhao Liu ◽

Huw A Ogilvie ◽

Luay Nakhleh

Keyword(s):

Simulated Data ◽

Variational Inference ◽

Divide And Conquer ◽

Data Sets ◽

Transition Rates ◽

Data Set ◽

Population Sizes ◽

Novel Method ◽

Approximate Likelihood ◽

Promising Avenue

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models.

Download Full-text

Efficient detection of repeating sites to accelerate phylogenetic likelihood calculations

10.1101/035873 ◽

2016 ◽

Cited By ~ 2

Author(s):

Kassian Kobert ◽

Alexandros Stamatakis ◽

Tomáš Flouri

Keyword(s):

Evolutionary Biology ◽

Likelihood Function ◽

Simulated Data ◽

Evolutionary Model ◽

Identical Result ◽

Model Parameters ◽

Data Sets ◽

Efficient Detection ◽

Novel Method ◽

Computational Bottleneck

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.

Download Full-text

Validation-Based Sparse Gaussian Process Classifier Design

Neural Computation ◽

10.1162/neco.2009.03-08-724 ◽

2009 ◽

Vol 21 (7) ◽

pp. 2082-2103 ◽

Cited By ~ 1

Author(s):

Shirish Shevade ◽

S. Sundararajan

Keyword(s):

Bayesian Methods ◽

Real World ◽

Basis Vector ◽

Data Sets ◽

Training Set ◽

Classifier Design ◽

Set Size ◽

Benchmark Data ◽

Regression Problems ◽

Classification And Regression

Gaussian processes (GPs) are promising Bayesian methods for classification and regression problems. Design of a GP classifier and making predictions using it is, however, computationally demanding, especially when the training set size is large. Sparse GP classifiers are known to overcome this limitation. In this letter, we propose and study a validation-based method for sparse GP classifier design. The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models. We use this measure for both basis vector selection and hyperparameter adaptation. The experimental results on several real-world benchmark data sets show better or comparable generalization performance over existing methods.

Download Full-text

Improved Privacy

Intelligent Information Technologies and Applications - Advances in Intelligent Information Technologies ◽

10.4018/978-1-59904-958-8.ch014 ◽

2011 ◽

pp. 295-316

Author(s):

K. Abumani ◽

R. Nedunchezhian

Keyword(s):

Data Mining ◽

Decision Making ◽

Empirical Analysis ◽

Time Complexity ◽

Strategic Decision ◽

Strategic Decision Making ◽

Sensitive Information ◽

Data Mining Techniques ◽

Novel Method ◽

Mining Tools

Data mining techniques have been widely used for extracting non-trivial information from massive amounts of data. They help in strategic decision-making as well as many more applications. However, data mining also has a few demerits apart from its usefulness. Sensitive information contained in the database may be brought out by the data mining tools. Different approaches are being utilized to hide the sensitive information. The proposed work in this article applies a novel method to access the generating transactions with minimum effort from the transactional database. It helps in reducing the time complexity of any hiding algorithm. The theoretical and empirical analysis of the algorithm shows that hiding of data using this proposed work performs association rule hiding quicker than other algorithms.

Download Full-text

Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch206 ◽

2008 ◽

pp. 3235-3251

Author(s):

Yongqiao Xiao ◽

Jenq-Foung Yao ◽

Guizhen Yang

Keyword(s):

Knowledge Discovery ◽

Web Sites ◽

Synthetic Data ◽

Research Interest ◽

Data Sets ◽

Labeled Trees ◽

Computational Overhead ◽

Large Databases ◽

Frequency Counts ◽

Novel Algorithm

Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like bioinformatic data. One major challenge arises due to the fact that embedded subtrees are no longer ordinary subtrees, but preserve only part of the ancestor-descendant relationships in the original trees. To solve the embedded subtree mining problem, in this article we propose a novel algorithm, called TreeGrow, which is optimized in two important respects. First, it obtains frequency counts of root-to-leaf paths through efficient compression of trees, thereby being able to quickly grow an embedded subtree pattern path by path instead of node by node. Second, candidate subtree generation is highly localized so as to avoid unnecessary computational overhead. Experimental results on benchmark synthetic data sets have shown that our algorithm can outperform unoptimized methods by up to 20 times.

Download Full-text

The statistical price index as an approximation of the constant-utility price index: an empirical analysis using Japanese data sets

Journal of the Japanese and International Economies ◽

10.1016/j.jjie.2003.12.001 ◽

2005 ◽

Vol 19 (1) ◽

pp. 37-50 ◽

Cited By ~ 1

Author(s):

Atsushi Maki

Keyword(s):

Empirical Analysis ◽

Price Index ◽

Data Sets

Download Full-text

Multi-Instance Learning for Image Retrieval with Relevance Feedback

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.1606 ◽

2013 ◽

Vol 427-429 ◽

pp. 1606-1609 ◽

Cited By ~ 1

Author(s):

Tao Chen ◽

Hui Fang Deng

Keyword(s):

Image Retrieval ◽

Objective Function ◽

Relevance Feedback ◽

Image Data ◽

Experimental Results ◽

Data Sets ◽

Digital Repository ◽

Novel Method

In this paper, we propose a novel method for image retrieval based on multi-instance learning with relevance feedback. The process of this method mainly includes the following three steps: First, it segments each image into a number of regions, treats images and regions as bags and instances respectively. Second, it constructs an objective function of multi-instance learning with the query images, which is used to rank the images from a large digital repository according to the distance values between the nearest region vector of each image and the maximum of the objective function. Third, based on the users relevance feedback, several rounds may be needed to refine the output images and their ranks. Finally, a satisfying set of images will be returned to users. Experimental results on COREL image data sets have demonstrated the effectiveness of the proposed approach.

Download Full-text