scholarly journals Tourism Information Data Processing Method Based on Multi-Source Data Fusion

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
YaoGuang Li ◽  
HeChi Gan

Urban social civilization and the quality of life of residents are gradually improved, and the development scale and trend of the leisure tourism industry have been growing. This paper constructs a multi-source data fusion model based on an ensemble learning algorithm, uses Ctrip 2020 open data set to train the model, and then obtains the tourism information data processing and prediction results. This paper takes the data of Ctrip as the training set and compares the trained model with the data of tunic and Feizhu. In this paper, sensor detection technology is used to analyze many famous scenic spots in China, including tourist type, gender, and location. The results show that tourism feature extraction results are consistent with data from trending flying bamboo, tunics, and other websites, according to the results of a multi-source fusion of tourism information. Among them, in the data of the first half of 2020, the prediction accuracy of the model after data processing is about 62%. Affected by the epidemic situation, the accuracy of the model is low. In the second half of the year, the prediction accuracy is 78%, which can be used to fuse tourism information in a short time. Therefore, the data show that the model has high learning ability and high trend prediction ability in tourism data processing, which can provide necessary information support for tourists.

2019 ◽  
Vol 11 (3) ◽  
pp. 346-356
Author(s):  
Juan José Fernández-Muñoz ◽  
Javier M. Moguerza ◽  
Clara Martin Duque ◽  
Diana Gomez Bruna

Purpose This paper aims to study the effect of imbalanced data in tourism quality models. It is demonstrated that this imbalance strongly affects the accuracy of tourism prediction models for hotel recommendation. Design/methodology/approach A questionnaire was used to survey 83,740 clients from hotels between five and two or less stars using a binary logistic model. The data correspond to a sample of 87 hotels from all around the world (120 countries from America, Africa, Asia, Europe and Australia). Findings The results of the study suggest that the imbalance in the data affects the prediction accuracy of the models used, especially to the prediction provided by unsatisfied clients, tending to consider them as satisfied customers. Practical implications In this sense, special attention should be given to unsatisfied clients or, at least, some safeguards to prevent the effect of the imbalance of data should be included in the models. Social implications In the tourism industry, the strong imbalance between satisfied and unsatisfied customers produces misleading prediction results. This fact could have effects on the quality policy of hoteliers. Originality/value In this work, focusing on tourism data, it is shown that this imbalance strongly affects the prediction accuracy of the models used, especially to the prediction of the recommendation provided by unsatisfied customers, tending to consider them as satisfied customers; a methodological approach based on the balance of the data set used to build the models is proposed to improve the accuracy of the prediction for unsatisfied customers provided by traditional services quality models.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Sven Lißner ◽  
Stefan Huber

Abstract Background GPS-based cycling data are increasingly available for traffic planning these days. However, the recorded data often contain more information than simply bicycle trips. GPS tracks resulting from tracking while using other modes of transport than bike or long periods at working locations while people are still tracking are only some examples. Thus, collected bicycle GPS data need to be processed adequately to use them for transportation planning. Results The article presents a multi-level approach towards bicycle-specific data processing. The data processing model contains different steps of processing (data filtering, smoothing, trip segmentation, transport mode recognition, driving mode detection) to finally obtain a correct data set that contains bicycle trips, only. The validation reveals a sound accuracy of the model at its’ current state (82–88%).


2021 ◽  
Vol 4 (1) ◽  
pp. 251524592092800
Author(s):  
Erin M. Buchanan ◽  
Sarah E. Crain ◽  
Ari L. Cunningham ◽  
Hannah R. Johnson ◽  
Hannah Stash ◽  
...  

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.


2010 ◽  
Vol 26-28 ◽  
pp. 620-624 ◽  
Author(s):  
Zhan Wei Du ◽  
Yong Jian Yang ◽  
Yong Xiong Sun ◽  
Chi Jun Zhang ◽  
Tuan Liang Li

This paper presents a modified Ant Colony Algorithm(ACA) called route-update ant colony algorithm(RUACA). The research attention is focused on improving the computational efficiency in the TSP problem. A new impact factor is introduced and proved to be effective for reducing the convergence time in the RUACA performance. In order to assess the RUACA performance, a simply supported data set of cities, which was taken as the source data in previous research using traditional ACA and genetic algorithm(GA), is chosen as a benchmark case study. Comparing with the ACA and GA results, it is shown that the presented RUACA has successfully solved the TSP problem. The results of the proposed algorithm are found to be satisfactory.


Sign in / Sign up

Export Citation Format

Share Document