Tourism Information Data Processing Method Based on Multi-Source Data Fusion

Journal of Sensors ◽

10.1155/2021/7047119 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

YaoGuang Li ◽

HeChi Gan

Keyword(s):

Data Fusion ◽

Data Processing ◽

Prediction Accuracy ◽

Tourism Industry ◽

Learning Ability ◽

Information Support ◽

Data Set ◽

Prediction Ability ◽

Source Data ◽

Tourism Information

Urban social civilization and the quality of life of residents are gradually improved, and the development scale and trend of the leisure tourism industry have been growing. This paper constructs a multi-source data fusion model based on an ensemble learning algorithm, uses Ctrip 2020 open data set to train the model, and then obtains the tourism information data processing and prediction results. This paper takes the data of Ctrip as the training set and compares the trained model with the data of tunic and Feizhu. In this paper, sensor detection technology is used to analyze many famous scenic spots in China, including tourist type, gender, and location. The results show that tourism feature extraction results are consistent with data from trending flying bamboo, tunics, and other websites, according to the results of a multi-source fusion of tourism information. Among them, in the data of the first half of 2020, the prediction accuracy of the model after data processing is about 62%. Affected by the epidemic situation, the accuracy of the model is low. In the second half of the year, the prediction accuracy is 78%, which can be used to fuse tourism information in a short time. Therefore, the data show that the model has high learning ability and high trend prediction ability in tourism data processing, which can provide necessary information support for tourists.

Download Full-text

A study on the effect of imbalanced data in tourism recommendation models

International Journal of Quality and Service Sciences ◽

10.1108/ijqss-05-2018-0050 ◽

2019 ◽

Vol 11 (3) ◽

pp. 346-356

Author(s):

Juan José Fernández-Muñoz ◽

Javier M. Moguerza ◽

Clara Martin Duque ◽

Diana Gomez Bruna

Keyword(s):

Prediction Accuracy ◽

Prediction Models ◽

Methodological Approach ◽

Imbalanced Data ◽

Tourism Industry ◽

Data Set ◽

Quality Models ◽

Content Type ◽

Binary Logistic Model ◽

Quality Policy

Purpose This paper aims to study the effect of imbalanced data in tourism quality models. It is demonstrated that this imbalance strongly affects the accuracy of tourism prediction models for hotel recommendation. Design/methodology/approach A questionnaire was used to survey 83,740 clients from hotels between five and two or less stars using a binary logistic model. The data correspond to a sample of 87 hotels from all around the world (120 countries from America, Africa, Asia, Europe and Australia). Findings The results of the study suggest that the imbalance in the data affects the prediction accuracy of the models used, especially to the prediction provided by unsatisfied clients, tending to consider them as satisfied customers. Practical implications In this sense, special attention should be given to unsatisfied clients or, at least, some safeguards to prevent the effect of the imbalance of data should be included in the models. Social implications In the tourism industry, the strong imbalance between satisfied and unsatisfied customers produces misleading prediction results. This fact could have effects on the quality policy of hoteliers. Originality/value In this work, focusing on tourism data, it is shown that this imbalance strongly affects the prediction accuracy of the models used, especially to the prediction of the recommendation provided by unsatisfied customers, tending to consider them as satisfied customers; a methodological approach based on the balance of the data set used to build the models is proposed to improve the accuracy of the prediction for unsatisfied customers provided by traditional services quality models.

Download Full-text

A multi-source data fusion approach to assess spatial-temporal variability and delineate homogeneous zones: A use case in a table grape vineyard in Greece

The Science of The Total Environment ◽

10.1016/j.scitotenv.2019.05.324 ◽

2019 ◽

Vol 684 ◽

pp. 155-163 ◽

Cited By ~ 2

Author(s):

Evangelos Anastasiou ◽

Annamaria Castrignanò ◽

Konstantinos Arvanitis ◽

Spyros Fountas

Keyword(s):

Data Fusion ◽

Temporal Variability ◽

Table Grape ◽

Use Case ◽

Source Data ◽

Fusion Approach

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text

A novel multi-source data fusion method based on Bayesian inference for accurate estimation of chlorophyll-a concentration over eutrophic lakes

Environmental Modelling & Software ◽

10.1016/j.envsoft.2021.105057 ◽

2021 ◽

pp. 105057

Author(s):

Cheng Chen ◽

Qiuwen Chen ◽

Gang Li ◽

Mengnan He ◽

Jianwei Dong ◽

...

Keyword(s):

Bayesian Inference ◽

Data Fusion ◽

Chlorophyll A ◽

Accurate Estimation ◽

Fusion Method ◽

Eutrophic Lakes ◽

Chlorophyll A Concentration ◽

Source Data

Download Full-text

Facing the needs for clean bicycle data – a bicycle-specific approach of GPS data processing

European Transport Research Review ◽

10.1186/s12544-020-00462-2 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Sven Lißner ◽

Stefan Huber

Keyword(s):

Data Processing ◽

Gps Data ◽

Data Set ◽

Specific Data ◽

Driving Mode ◽

Mode Detection ◽

Current State ◽

Mode Recognition ◽

Recorded Data ◽

Gps Tracks

Abstract Background GPS-based cycling data are increasingly available for traffic planning these days. However, the recorded data often contain more information than simply bicycle trips. GPS tracks resulting from tracking while using other modes of transport than bike or long periods at working locations while people are still tracking are only some examples. Thus, collected bicycle GPS data need to be processed adequately to use them for transportation planning. Results The article presents a multi-level approach towards bicycle-specific data processing. The data processing model contains different steps of processing (data filtering, smoothing, trip segmentation, transport mode recognition, driving mode detection) to finally obtain a correct data set that contains bicycle trips, only. The validation reveals a sound accuracy of the model at its’ current state (82–88%).

Download Full-text

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920928007 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592092800

Author(s):

Erin M. Buchanan ◽

Sarah E. Crain ◽

Ari L. Cunningham ◽

Hannah R. Johnson ◽

Hannah Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Web Applications ◽

Data Sets ◽

Data Dictionary ◽

Data Set ◽

Entire Process ◽

Shared Data ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.

Download Full-text

What to look for in distributed (source) data processing

10.1145/1499402.1499577 ◽

1977 ◽

Author(s):

W. Harry Vickers

Keyword(s):

Data Processing ◽

Distributed Source ◽

Source Data

Download Full-text

Research on Equipment Situation Display Based on Multi-source Data Fusion

2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC) ◽

10.1109/icceic51584.2020.00048 ◽

2020 ◽

Author(s):

Cai-sen Chen ◽

Hai-rong Hu ◽

Lu-lu Fang ◽

Yang-xia Xiang

Keyword(s):

Data Fusion ◽

Source Data

Download Full-text

Open-source Data Processing in Stable Isotope Ratio Mass Spectrometry: New Software Packages for Efficient, Transparent and Reproducible IRMS Data Reduction

10.1002/essoar.10500605.1 ◽

2019 ◽

Author(s):

Sebastian Kopf

Keyword(s):

Mass Spectrometry ◽

Stable Isotope ◽

Data Processing ◽

Data Reduction ◽

Isotope Ratio ◽

Isotope Ratio Mass Spectrometry ◽

Stable Isotope Ratio ◽

Software Packages ◽

Open Source Data ◽

Source Data

Download Full-text

An Improved Ant Colony Optimization Algorithm for Solving the TSP Problem

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.26-28.620 ◽

2010 ◽

Vol 26-28 ◽

pp. 620-624 ◽

Cited By ~ 3

Author(s):

Zhan Wei Du ◽

Yong Jian Yang ◽

Yong Xiong Sun ◽

Chi Jun Zhang ◽

Tuan Liang Li

Keyword(s):

Genetic Algorithm ◽

Ant Colony Algorithm ◽

Ant Colony ◽

Convergence Time ◽

Ant Colony Optimization Algorithm ◽

Data Set ◽

Research Attention ◽

Simply Supported ◽

Source Data

This paper presents a modified Ant Colony Algorithm(ACA) called route-update ant colony algorithm(RUACA). The research attention is focused on improving the computational efficiency in the TSP problem. A new impact factor is introduced and proved to be effective for reducing the convergence time in the RUACA performance. In order to assess the RUACA performance, a simply supported data set of cities, which was taken as the source data in previous research using traditional ACA and genetic algorithm(GA), is chosen as a benchmark case study. Comparing with the ACA and GA results, it is shown that the presented RUACA has successfully solved the TSP problem. The results of the proposed algorithm are found to be satisfactory.

Download Full-text