Application of Dempster–Shafer Data Fusion Technique in Support of Decision Making with Big Data

Ping Yi; Songling Zhang

doi:10.3141/2645-04

Application of Dempster–Shafer Data Fusion Technique in Support of Decision Making with Big Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2645-04 ◽

2017 ◽

Vol 2645 (1) ◽

pp. 32-37 ◽

Cited By ~ 2

Author(s):

Ping Yi ◽

Songling Zhang

Keyword(s):

Decision Making ◽

Big Data ◽

Data Fusion ◽

Incomplete Information ◽

Data Sources ◽

Fusion Technique ◽

Crowd Management ◽

Multiple Data ◽

Data Source ◽

Single Data

This paper introduces applications of the Dempster–Shafer (D-S) data fusion technique in transportation system decision making. D-S inference is a statistics-based data classification technique, and it can be used when data sources contribute discontinuous and incomplete information and no single data source can produce an overwhelmingly high probability of certainty for identifying the most probable event. The technique captures and combines the information contributed by the data sources by using Dempster’s rule to find the conjunction of the events and to determine the highest associated probability. The D-S theory is explained and its implementation described through numerical examples of a ride-hauling service and of crowd management at a subway station. Results from the applications have shown that the technique is very effective in dealing with incomplete information and multiple data sources in the era of big data.

Download Full-text

Data-Driven Surveillance: Effective Collection, Integration, and Interpretation of Data to Support Decision Making

Frontiers in Veterinary Science ◽

10.3389/fvets.2021.633977 ◽

2021 ◽

Vol 8 ◽

Author(s):

Fernanda C. Dórea ◽

Crawford W. Revie

Keyword(s):

Decision Making ◽

Big Data ◽

Data Integration ◽

Data Access ◽

Data Sources ◽

Data Driven ◽

Multiple Data ◽

Combining Evidence ◽

Complex Variety ◽

Support Decision Making

The biggest change brought about by the “era of big data” to health in general, and epidemiology in particular, relates arguably not to the volume of data encountered, but to its variety. An increasing number of new data sources, including many not originally collected for health purposes, are now being used for epidemiological inference and contextualization. Combining evidence from multiple data sources presents significant challenges, but discussions around this subject often confuse issues of data access and privacy, with the actual technical challenges of data integration and interoperability. We review some of the opportunities for connecting data, generating information, and supporting decision-making across the increasingly complex “variety” dimension of data in population health, to enable data-driven surveillance to go beyond simple signal detection and support an expanded set of surveillance goals.

Download Full-text

Mapping the United Nations Fundamental Principles of Official Statistics against new and big data sources

Statistical Journal of the IAOS ◽

10.3233/sji-210789 ◽

2021 ◽

Vol 37 (1) ◽

pp. 161-169

Author(s):

Dominik Rozkrut ◽

Olga Świerkot-Strużewska ◽

Gemma Van Halderen

Keyword(s):

Big Data ◽

Public Information ◽

Fundamental Principle ◽

Data Sources ◽

Official Statistics ◽

Development Agenda ◽

Data Gaps ◽

Data Source ◽

Exciting Time ◽

Statistical Systems

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.

Download Full-text

Event detection of different English data sources based on transfer learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189798 ◽

2021 ◽

pp. 1-11

Author(s):

Yanan Huang ◽

Yuji Miao ◽

Zhenjing Da

Keyword(s):

Transfer Learning ◽

Event Detection ◽

Visual Analysis ◽

Learning Algorithm ◽

Data Sources ◽

Data Set ◽

Data Source ◽

Single Data Source ◽

The Difference ◽

Single Data

The methods of multi-modal English event detection under a single data source and isomorphic event detection of different English data sources based on transfer learning still need to be improved. In order to improve the efficiency of English and data source time detection, based on the transfer learning algorithm, this paper proposes multi-modal event detection under a single data source and isomorphic event detection based on transfer learning for different data sources. Moreover, by stacking multiple classification models, this paper makes each feature merge with each other, and conducts confrontation training through the difference between the two classifiers to further make the distribution of different source data similar. In addition, in order to verify the algorithm proposed in this paper, a multi-source English event detection data set is collected through a data collection method. Finally, this paper uses the data set to verify the method proposed in this paper and compare it with the current most mainstream transfer learning methods. Through experimental analysis, convergence analysis, visual analysis and parameter evaluation, the effectiveness of the algorithm proposed in this paper is demonstrated.

Download Full-text

Big data and portfolio optimization: A novel approach integrating DEA with multiple data sources

Omega ◽

10.1016/j.omega.2021.102479 ◽

2021 ◽

pp. 102479

Author(s):

Zhongbao Zhou ◽

Meng Gao ◽

Helu Xiao ◽

Rui Wang ◽

Wenbin Liu

Keyword(s):

Big Data ◽

Portfolio Optimization ◽

Data Sources ◽

Multiple Data Sources ◽

Multiple Data ◽

Novel Approach

Download Full-text

The Integration of Photodiode and Camera for Visible Light Positioning by Using Fixed-Lag Ensemble Kalman Smoother

Remote Sensing ◽

10.3390/rs11111387 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1387

Author(s):

Yuan Zhuang ◽

Qin Wang ◽

You Li ◽

Zhouzheng Gao ◽

Bingpeng Zhou ◽

...

Keyword(s):

Visible Light ◽

Real Time ◽

Estimation Error ◽

Navigation Systems ◽

Stochastic Sampling ◽

Kalman Smoother ◽

Multiple Data ◽

Data Source ◽

3D Positioning ◽

Single Data

Visible Light Positioning (VLP) has become one of the most popular positioning and navigation systems in this decade. Filter-based VLP systems can provide real-time solutions but have limited accuracy. On the contrary, fixed-interval smoothers can help VLP achieve higher accuracy but require post-processing. In this article, a trade-off solution, Fixed-Lag Ensemble Kalman Smoother (FLEnKS), is proposed for VLP to achieve a semi-real-time and accurate positioning solution. The forward part of the FLEnKS is based on the Ensemble Kalman Filter (EnKF), which uses stochastic sampling with ensemble members and enables a better reflection of the features of nonlinear systems. The backward filter in the FLEnKS compensates for the estimation error from the forward filter using the linearization based on error states and further reduces the estimation error. Furthermore, multiple data from both photodiode (PD) and camera are fused in the proposed FLEnKS for VLP, which further improves the accuracy of conventional VLP with a single data source. Preliminary field test results show that the proposed FLEnKS provides a semi-real-time positioning solution with the average 3D positioning accuracy of 15.63 cm in dynamic tests.

Download Full-text

A spatio-temporally weighted hybrid model to improve estimates of personal PM2.5 exposure: Incorporating big data from multiple data sources

Environmental Pollution ◽

10.1016/j.envpol.2019.07.034 ◽

2019 ◽

Vol 253 ◽

pp. 403-411 ◽

Cited By ~ 2

Author(s):

YuJie Ben ◽

FuJun Ma ◽

Hao Wang ◽

Muhammad Azher Hassan ◽

Romanenko Yevheniia ◽

...

Keyword(s):

Big Data ◽

Hybrid Model ◽

Data Sources ◽

Multiple Data Sources ◽

Multiple Data ◽

Pm2.5 Exposure

Download Full-text

Dependency Injection mechanism in Intelligence Information System of Power Quality

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.3085 ◽

2012 ◽

Vol 241-244 ◽

pp. 3085-3091

Author(s):

Jian Gong ◽

Cui Hong Lv ◽

Lin Hai Qi ◽

Su Xia Ma

Keyword(s):

Information System ◽

Power Quality ◽

Data Sources ◽

Monitoring Data ◽

Multiple Data Source ◽

Data Provider ◽

Multiple Data ◽

Injection Mechanism ◽

Data Source ◽

Intelligent Information

The calculation subsystems of the power quality intelligent information system will face many types of monitoring data source, and when different data sources provide data for calculation subsystem, it does not need to change algorithm but need to change the way how to get the data needed; then how to make the calculation subsystem does not alter with the change of data provider becomes a necessary demand;Aiming at the problem this paper put forward a set of solutions, which are dependent on dependency-injection, to help the calculation subsystem in multiple data source supporting.

Download Full-text

Measuring conflation success

Revista Cartográfica ◽

10.35424/rcarto.v0i94.341 ◽

2017 ◽

pp. 41-64

Author(s):

Marta Padilla-Ruiz ◽

Carlos López-Vázquez

Keyword(s):

Big Data ◽

Product Quality ◽

Smart Cities ◽

Spatial Scales ◽

Evaluation Process ◽

Heterogeneous Data ◽

The Other ◽

Data Sources ◽

Data Source ◽

Other Regarding

We are immersed in the Big Data era, where there is a large amount of heterogeneous data, both in time and spatial scales. This data starts to be streamed in real time from different devices and sensors, well illustrated by the new concept of Smart Cities. Conflation processes play an important role in this scenario, defined as the procedure for the combination and integration of different data sources, improving the level of information of the result. It also allows to update geographical databases (GDB), conflating different kind of sources where one of them is more accurate or updated than the other. Regarding geometric conflation, the procedure involves transforming features from one data source to another, minimizing the geometric discrepancies between them. Accuracy has to be taken into account in these processes, and the results need to be measured and evaluated in order to have a better understanding of product quality. In this paper, conflation evaluation process is described along with the different metrics and approaches to assess its accuracy.

Download Full-text

Big data integration enhancement based on attributes conditional dependency and similarity index method

Mathematical Biosciences and Engineering ◽

10.3934/mbe.2021429 ◽

2021 ◽

Vol 18 (6) ◽

pp. 8661-8682

Author(s):

Vishnu Vandana Kolisetty ◽

◽

Dharmendra Singh Rajput ◽

Keyword(s):

Big Data ◽

Semantic Analysis ◽

Similarity Index ◽

Data Sources ◽

Sources Of Information ◽

Index Method ◽

Related Information ◽

Multiple Data ◽

Integration Techniques ◽

Conditional Dependency

<abstract> <p>Big data has attracted a lot of attention in many domain sectors. The volume of data-generating today in every domain in form of digital is enormous and same time acquiring such information for various analyses and decisions is growing in every field. So, it is significant to integrate the related information based on their similarity. But the existing integration techniques are usually having processing and time complexity and even having constraints in interconnecting multiple data sources. Many of these sources of information come from a variety of sources. Due to the complex distribution of many different data sources, it is difficult to determine the relationship between the data, and it is difficult to study the same data structures for integration to effectively access or retrieve data to meet the needs of different data analysis. In this paper, proposed an integration of big data with computation of attribute conditional dependency (ACD) and similarity index (SI) methods termed as ACD-SI. The ACD-SI mechanism allows using of an improved Bayesian mechanism to analyze the distribution of attributes in a document in the form of dependence on possible attributes. It also uses attribute conversion and selection mechanisms for mapping and grouping data for integration and uses methods such as LSA (latent semantic analysis) to analyze the content of data attributes to extract relevant and accurate data. It performs a series of experiments to measure the overall purity and normalization of the data integrity, using a large dataset of bibliographic data from various publications. The obtained purity and NMI ratio confined the clustered data relevancy and the measure of precision, recall, and accurate rate justified the improvement of the proposal is compared to the existing approaches.</p> </abstract>

Download Full-text

Learning Generative Adversarial Networks from Multiple Data Sources

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/391 ◽

2019 ◽

Cited By ~ 2

Author(s):

Trung Le ◽

Quan Hoang ◽

Hung Vu ◽

Tu Dinh Nguyen ◽

Hung Bui ◽

...

Keyword(s):

Generative Models ◽

Data Sources ◽

Black Hair ◽

Primary Data ◽

Generative Adversarial Networks ◽

Auxiliary Data ◽

Adversarial Networks ◽

Multiple Data ◽

Data Source ◽

Push And Pull

Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs' formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstrate the merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN's effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.

Download Full-text