Application of Dempster–Shafer Data Fusion Technique in Support of Decision Making with Big Data

Author(s):  
Ping Yi ◽  
Songling Zhang

This paper introduces applications of the Dempster–Shafer (D-S) data fusion technique in transportation system decision making. D-S inference is a statistics-based data classification technique, and it can be used when data sources contribute discontinuous and incomplete information and no single data source can produce an overwhelmingly high probability of certainty for identifying the most probable event. The technique captures and combines the information contributed by the data sources by using Dempster’s rule to find the conjunction of the events and to determine the highest associated probability. The D-S theory is explained and its implementation described through numerical examples of a ride-hauling service and of crowd management at a subway station. Results from the applications have shown that the technique is very effective in dealing with incomplete information and multiple data sources in the era of big data.

2021 ◽  
Vol 8 ◽  
Author(s):  
Fernanda C. Dórea ◽  
Crawford W. Revie

The biggest change brought about by the “era of big data” to health in general, and epidemiology in particular, relates arguably not to the volume of data encountered, but to its variety. An increasing number of new data sources, including many not originally collected for health purposes, are now being used for epidemiological inference and contextualization. Combining evidence from multiple data sources presents significant challenges, but discussions around this subject often confuse issues of data access and privacy, with the actual technical challenges of data integration and interoperability. We review some of the opportunities for connecting data, generating information, and supporting decision-making across the increasingly complex “variety” dimension of data in population health, to enable data-driven surveillance to go beyond simple signal detection and support an expanded set of surveillance goals.


2021 ◽  
Vol 37 (1) ◽  
pp. 161-169
Author(s):  
Dominik Rozkrut ◽  
Olga Świerkot-Strużewska ◽  
Gemma Van Halderen

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.


2021 ◽  
pp. 1-11
Author(s):  
Yanan Huang ◽  
Yuji Miao ◽  
Zhenjing Da

The methods of multi-modal English event detection under a single data source and isomorphic event detection of different English data sources based on transfer learning still need to be improved. In order to improve the efficiency of English and data source time detection, based on the transfer learning algorithm, this paper proposes multi-modal event detection under a single data source and isomorphic event detection based on transfer learning for different data sources. Moreover, by stacking multiple classification models, this paper makes each feature merge with each other, and conducts confrontation training through the difference between the two classifiers to further make the distribution of different source data similar. In addition, in order to verify the algorithm proposed in this paper, a multi-source English event detection data set is collected through a data collection method. Finally, this paper uses the data set to verify the method proposed in this paper and compare it with the current most mainstream transfer learning methods. Through experimental analysis, convergence analysis, visual analysis and parameter evaluation, the effectiveness of the algorithm proposed in this paper is demonstrated.


Omega ◽  
2021 ◽  
pp. 102479
Author(s):  
Zhongbao Zhou ◽  
Meng Gao ◽  
Helu Xiao ◽  
Rui Wang ◽  
Wenbin Liu

2019 ◽  
Vol 11 (11) ◽  
pp. 1387
Author(s):  
Yuan Zhuang ◽  
Qin Wang ◽  
You Li ◽  
Zhouzheng Gao ◽  
Bingpeng Zhou ◽  
...  

Visible Light Positioning (VLP) has become one of the most popular positioning and navigation systems in this decade. Filter-based VLP systems can provide real-time solutions but have limited accuracy. On the contrary, fixed-interval smoothers can help VLP achieve higher accuracy but require post-processing. In this article, a trade-off solution, Fixed-Lag Ensemble Kalman Smoother (FLEnKS), is proposed for VLP to achieve a semi-real-time and accurate positioning solution. The forward part of the FLEnKS is based on the Ensemble Kalman Filter (EnKF), which uses stochastic sampling with ensemble members and enables a better reflection of the features of nonlinear systems. The backward filter in the FLEnKS compensates for the estimation error from the forward filter using the linearization based on error states and further reduces the estimation error. Furthermore, multiple data from both photodiode (PD) and camera are fused in the proposed FLEnKS for VLP, which further improves the accuracy of conventional VLP with a single data source. Preliminary field test results show that the proposed FLEnKS provides a semi-real-time positioning solution with the average 3D positioning accuracy of 15.63 cm in dynamic tests.


2019 ◽  
Vol 253 ◽  
pp. 403-411 ◽  
Author(s):  
YuJie Ben ◽  
FuJun Ma ◽  
Hao Wang ◽  
Muhammad Azher Hassan ◽  
Romanenko Yevheniia ◽  
...  

2012 ◽  
Vol 241-244 ◽  
pp. 3085-3091
Author(s):  
Jian Gong ◽  
Cui Hong Lv ◽  
Lin Hai Qi ◽  
Su Xia Ma

The calculation subsystems of the power quality intelligent information system will face many types of monitoring data source, and when different data sources provide data for calculation subsystem, it does not need to change algorithm but need to change the way how to get the data needed; then how to make the calculation subsystem does not alter with the change of data provider becomes a necessary demand;Aiming at the problem this paper put forward a set of solutions, which are dependent on dependency-injection, to help the calculation subsystem in multiple data source supporting.


2017 ◽  
pp. 41-64
Author(s):  
Marta Padilla-Ruiz ◽  
Carlos López-Vázquez

We are immersed in the Big Data era, where there is a large amount of heterogeneous data, both in time and spatial scales. This data starts to be streamed in real time from different devices and sensors, well illustrated by the new concept of Smart Cities. Conflation processes play an important role in this scenario, defined as the procedure for the combination and integration of different data sources, improving the level of information of the result. It also allows to update geographical databases (GDB), conflating different kind of sources where one of them is more accurate or updated than the other. Regarding geometric conflation, the procedure involves transforming features from one data source to another, minimizing the geometric discrepancies between them. Accuracy has to be taken into account in these processes, and the results need to be measured and evaluated in order to have a better understanding of product quality. In this paper, conflation evaluation process is described along with the different metrics and approaches to assess its accuracy.


2021 ◽  
Vol 18 (6) ◽  
pp. 8661-8682
Author(s):  
Vishnu Vandana Kolisetty ◽  
◽  
Dharmendra Singh Rajput ◽  

<abstract> <p>Big data has attracted a lot of attention in many domain sectors. The volume of data-generating today in every domain in form of digital is enormous and same time acquiring such information for various analyses and decisions is growing in every field. So, it is significant to integrate the related information based on their similarity. But the existing integration techniques are usually having processing and time complexity and even having constraints in interconnecting multiple data sources. Many of these sources of information come from a variety of sources. Due to the complex distribution of many different data sources, it is difficult to determine the relationship between the data, and it is difficult to study the same data structures for integration to effectively access or retrieve data to meet the needs of different data analysis. In this paper, proposed an integration of big data with computation of attribute conditional dependency (ACD) and similarity index (SI) methods termed as ACD-SI. The ACD-SI mechanism allows using of an improved Bayesian mechanism to analyze the distribution of attributes in a document in the form of dependence on possible attributes. It also uses attribute conversion and selection mechanisms for mapping and grouping data for integration and uses methods such as LSA (latent semantic analysis) to analyze the content of data attributes to extract relevant and accurate data. It performs a series of experiments to measure the overall purity and normalization of the data integrity, using a large dataset of bibliographic data from various publications. The obtained purity and NMI ratio confined the clustered data relevancy and the measure of precision, recall, and accurate rate justified the improvement of the proposal is compared to the existing approaches.</p> </abstract>


Author(s):  
Trung Le ◽  
Quan Hoang ◽  
Hung Vu ◽  
Tu Dinh Nguyen ◽  
Hung Bui ◽  
...  

Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs' formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstrate the merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN's effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.


Sign in / Sign up

Export Citation Format

Share Document