scholarly journals Fast Performance Modeling across Different Database Versions Using Partitioned Co-Kriging

2021 ◽  
Vol 11 (20) ◽  
pp. 9669
Author(s):  
Rong Cao ◽  
Liang Bao ◽  
Shouxin Wei ◽  
Jiarui Duan ◽  
Xi Wu ◽  
...  

Database systems have a large number of configuration parameters that control functional and non-functional properties (e.g., performance and cost). Different configurations may lead to different performance values. To understand and predict the effect of configuration parameters on system performance, several learning-based strategies have been recently proposed. However, existing approaches usually assume a fixed database version such that learning has to be repeated once the database version changes. Repeating measurement and learning for each version is expensive and often practically infeasible. Instead, we propose the Partitioned Co-Kriging (PCK) approach that transfers knowledge from an older database version (source domain) to learn a reliable performance prediction model fast for a newer database version (target domain). Our method is based on the key observations that performance responses typically exhibit similarities across different database versions. We conducted extensive experiments under 5 different database systems with different versions to demonstrate the superiority of PCK. Experimental results show that PCK outperforms six state-of-the-art baseline algorithms in terms of prediction accuracy and measurement effort.

2014 ◽  
Vol 13 (9) ◽  
pp. 4859-4867
Author(s):  
Khaled Saleh Maabreh

Distributed database management systems manage a huge amount of data as well as large and increasingly growing number of users through different types of queries. Therefore, efficient methods for accessing these data volumes will be required to provide a high and an acceptable level of system performance.  Data in these systems are varying in terms of types from texts to images, audios and videos that must be available through an optimized level of replication. Distributed database systems have many parameters like data distribution degree, operation mode and the number of sites and replication. These parameters have played a major role in any performance evaluation study. This paper investigates the main parameters that may affect the system performance, which may help with configuring the distributed database system for enhancing the overall system performance.


Author(s):  
Masoumeh Zareapoor ◽  
Jie Yang

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.


2021 ◽  
Vol 11 (5) ◽  
pp. 603
Author(s):  
Chunlei Shi ◽  
Xianwei Xin ◽  
Jiacai Zhang

Machine learning methods are widely used in autism spectrum disorder (ASD) diagnosis. Due to the lack of labelled ASD data, multisite data are often pooled together to expand the sample size. However, the heterogeneity that exists among different sites leads to the degeneration of machine learning models. Herein, the three-way decision theory was introduced into unsupervised domain adaptation in the first time, and applied to optimize the pseudolabel of the target domain/site from functional magnetic resonance imaging (fMRI) features related to ASD patients. The experimental results using multisite fMRI data show that our method not only narrows the gap of the sample distribution among domains but is also superior to the state-of-the-art domain adaptation methods in ASD recognition. Specifically, the ASD recognition accuracy of the proposed method is improved on all the six tasks, by 70.80%, 75.41%, 69.91%, 72.13%, 71.01% and 68.85%, respectively, compared with the existing methods.


1982 ◽  
Vol 104 (2) ◽  
pp. 84-88 ◽  
Author(s):  
J. L. Tangler

The purpose of this work was to evaluate the state-of-the-art of performance prediction for small horizontal-axis wind turbines. This effort was undertaken since few of the existing performance methods used to predict rotor power output have been validated with reliable test data. The program involved evaluating several existing performance models from four contractors by comparing their predictions for two wind turbines with actual test data. Test data were acquired by Rocky Flats Test and Development Center and furnished to the contractors after submission of their prediction reports. The results of the correlation study will help identify areas in which existing rotor performance models are inadequate and, where possible, the reasons for the models shortcomings. In addition, several problems associated with obtaining accurate test data will be discussed.


2021 ◽  
Author(s):  
Arabzadehghahyazi Negar

file:///C:/Users/MWF/Downloads/Arabzadehghahyazi, Negar.Pre-retrieval Query Performance Prediction (QPP) methods are oblivious to the performance of the retrieval model as they predict query difficulty prior to observing the set of documents retrieved for the query. Among pre-retrieval query performance predictors, specificity-based metrics investigate how corpus, query and corpus-query level statistics can be used to predict the performance of the query. In this thesis, we explore how neural embeddings can be utilized to define corpus-independent and semantics-aware specificity metrics. Our metrics are based on the intuition that a term that is closely surrounded by other terms in the embedding space is more likely to be specific while a term surrounded by less closely related terms is more likely to be generic. On this basis, we leverage geometric properties between embedded terms to define four groups of metrics: (1) neighborhood-based, (2) graph-based, (3) cluster-based and (4) vector-based metrics. Moreover, we employ learning-to-rank techniques to analyze the importance of individual specificity metrics. To evaluate the proposed metrics, we have curated and publicly share a test collection of term specificity measurements defined based on Wikipedia category hierarchy and DMOZ taxonomy. We report on our extensive experiments on the effectiveness of our metrics through metric comparison, ablation study and comparison against the state-of-the-art baselines. We have shown that our proposed set of pre-retrieval QPP metrics based on the properties of pre-trained neural embeddings are more effective for performance prediction compared to the state-of-the-art methods. We report our findings based on Robust04, ClueWeb09 and Gov2 corpora and their associated TREC topics.


Author(s):  
Старовойтенко Олексій Володимирович

Due to the growth of data and the number of computational tasks, it is necessary to ensure the required level of system performance. Performance can be achieved by scaling the system horizontally / vertically, but even increasing the amount of computing resources does not solve all the problems. For example, a complex computational problem should be decomposed into smaller subtasks, the computation time of which is much shorter. However, the number of such tasks may be constantly increasing, due to which the processing on the services is delayed or even certain messages will not be processed. In many cases, message processing should be coordinated, for example, message A should be processed only after messages B and C. Given the problems of processing a large number of subtasks, we aim in this work - to design a mechanism for effective distributed scheduling through message queues. As services we will choose cloud services Amazon Webservices such as Amazon EC2, SQS and DynamoDB. Our FlexQueue solution can compete with state-of-the-art systems such as Sparrow and MATRIX. Distributed systems are quite complex and require complex algorithms and control units, so the solution of this problem requires detailed research.


2018 ◽  
Vol 113 ◽  
pp. 270-278 ◽  
Author(s):  
Yuyun Zeng ◽  
Jingquan Liu ◽  
Kaichao Sun ◽  
Lin-wen Hu

Sign in / Sign up

Export Citation Format

Share Document