A WEIGHTED COMBINER OF STACKING BASED METHODS

2012 ◽  
Vol 21 (06) ◽  
pp. 1250040
Author(s):  
NIALL ROONEY

In this paper we present a novel method that forms a weighted combination of a range of Stacking based methods for regression problems, without adding any major computational overhead in comparison to stacking itself. The intention of the technique is to benefit from the variation in performance of individual Stacking methods as demonstrated with different data sets, in order to provide a more robust technique overall. We detail an empirical analysis of the technique referred to as weighted Meta–Combiner (wMetaComb) and compare its performance to its underlying techniques.

2021 ◽  
Vol 13 (9) ◽  
pp. 4648
Author(s):  
Rana Muhammad Adnan ◽  
Kulwinder Singh Parmar ◽  
Salim Heddam ◽  
Shamsuddin Shahid ◽  
Ozgur Kisi

The accurate estimation of suspended sediments (SSs) carries significance in determining the volume of dam storage, river carrying capacity, pollution susceptibility, soil erosion potential, aquatic ecological impacts, and the design and operation of hydraulic structures. The presented study proposes a new method for accurately estimating daily SSs using antecedent discharge and sediment information. The novel method is developed by hybridizing the multivariate adaptive regression spline (MARS) and the Kmeans clustering algorithm (MARS–KM). The proposed method’s efficacy is established by comparing its performance with the adaptive neuro-fuzzy system (ANFIS), MARS, and M5 tree (M5Tree) models in predicting SSs at two stations situated on the Yangtze River of China, according to the three assessment measurements, RMSE, MAE, and NSE. Two modeling scenarios are employed; data are divided into 50–50% for model training and testing in the first scenario, and the training and test data sets are swapped in the second scenario. In Guangyuan Station, the MARS–KM showed a performance improvement compared to ANFIS, MARS, and M5Tree methods in term of RMSE by 39%, 30%, and 18% in the first scenario and by 24%, 22%, and 8% in the second scenario, respectively, while the improvement in RMSE of ANFIS, MARS, and M5Tree was 34%, 26%, and 27% in the first scenario and 7%, 16%, and 6% in the second scenario, respectively, at Beibei Station. Additionally, the MARS–KM models provided much more satisfactory estimates using only discharge values as inputs.


2018 ◽  
Vol 6 ◽  
pp. 451-465 ◽  
Author(s):  
Daniela Gerz ◽  
Ivan Vulić ◽  
Edoardo Ponti ◽  
Jason Naradowsky ◽  
Roi Reichart ◽  
...  

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.


Author(s):  
Chuang Sun ◽  
Zhousuo Zhang ◽  
Zhengjia He ◽  
Zhongjie Shen ◽  
Binqiang Chen ◽  
...  

Bearing performance degradation assessment is meaningful for keeping mechanical reliability and safety. For this purpose, a novel method based on kernel locality preserving projection is proposed in this article. Kernel locality preserving projection extends the traditional locality preserving projection into the non-linear form by using a kernel function and it is more appropriate to explore the non-linear information hidden in the data sets. Considering this point, the kernel locality preserving projection is used to generate a non-linear subspace from the normal bearing data. The test data are then projected onto the subspace to obtain an index for assessing bearing degradation degrees. The degradation index that is expressed in the form of inner product indicates similarity of the normal data and the test data. Validations by using monitoring data from two experiments show the effectiveness of the proposed method.


2021 ◽  
pp. gr.273631.120
Author(s):  
Xinhao Liu ◽  
Huw A Ogilvie ◽  
Luay Nakhleh

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models.


2016 ◽  
Author(s):  
Kassian Kobert ◽  
Alexandros Stamatakis ◽  
Tomáš Flouri

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.


2009 ◽  
Vol 21 (7) ◽  
pp. 2082-2103 ◽  
Author(s):  
Shirish Shevade ◽  
S. Sundararajan

Gaussian processes (GPs) are promising Bayesian methods for classification and regression problems. Design of a GP classifier and making predictions using it is, however, computationally demanding, especially when the training set size is large. Sparse GP classifiers are known to overcome this limitation. In this letter, we propose and study a validation-based method for sparse GP classifier design. The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models. We use this measure for both basis vector selection and hyperparameter adaptation. The experimental results on several real-world benchmark data sets show better or comparable generalization performance over existing methods.


Author(s):  
K. Abumani ◽  
R. Nedunchezhian

Data mining techniques have been widely used for extracting non-trivial information from massive amounts of data. They help in strategic decision-making as well as many more applications. However, data mining also has a few demerits apart from its usefulness. Sensitive information contained in the database may be brought out by the data mining tools. Different approaches are being utilized to hide the sensitive information. The proposed work in this article applies a novel method to access the generating transactions with minimum effort from the transactional database. It helps in reducing the time complexity of any hiding algorithm. The theoretical and empirical analysis of the algorithm shows that hiding of data using this proposed work performs association rule hiding quicker than other algorithms.


2008 ◽  
pp. 3235-3251
Author(s):  
Yongqiao Xiao ◽  
Jenq-Foung Yao ◽  
Guizhen Yang

Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like bioinformatic data. One major challenge arises due to the fact that embedded subtrees are no longer ordinary subtrees, but preserve only part of the ancestor-descendant relationships in the original trees. To solve the embedded subtree mining problem, in this article we propose a novel algorithm, called TreeGrow, which is optimized in two important respects. First, it obtains frequency counts of root-to-leaf paths through efficient compression of trees, thereby being able to quickly grow an embedded subtree pattern path by path instead of node by node. Second, candidate subtree generation is highly localized so as to avoid unnecessary computational overhead. Experimental results on benchmark synthetic data sets have shown that our algorithm can outperform unoptimized methods by up to 20 times.


2013 ◽  
Vol 427-429 ◽  
pp. 1606-1609 ◽  
Author(s):  
Tao Chen ◽  
Hui Fang Deng

In this paper, we propose a novel method for image retrieval based on multi-instance learning with relevance feedback. The process of this method mainly includes the following three steps: First, it segments each image into a number of regions, treats images and regions as bags and instances respectively. Second, it constructs an objective function of multi-instance learning with the query images, which is used to rank the images from a large digital repository according to the distance values between the nearest region vector of each image and the maximum of the objective function. Third, based on the users relevance feedback, several rounds may be needed to refine the output images and their ranks. Finally, a satisfying set of images will be returned to users. Experimental results on COREL image data sets have demonstrated the effectiveness of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document