Predicting Drug Side Effects Using Data Analytics and the Integration of Multiple Data Sources

Real-time Big Data Analytics Framework with Data Blending Approach for Multiple Data sources in Smart City Applications

Scalable Computing Practice and Experience ◽

10.12694/scpe.v21i4.1759 ◽

2020 ◽

Vol 21 (4) ◽

pp. 611-623

Author(s):

Manjunatha S ◽

Annappa B

Keyword(s):

Real Time ◽

Smart City ◽

Data Analytics ◽

Public Safety ◽

Big Data Analytics ◽

Data Sources ◽

Data Driven ◽

The Public ◽

Multiple Data Sources ◽

Multiple Data

Advancement in Information Communication Technology (ICT) and the Internet of Things (IoT) has to lead tothe continuous generation of a large amount of data. Smart city projects are being implemented in various parts of the world where analysis of public data helps in providing a better quality of life. Data analytics plays a vital role in many such data-driven applications. Real-time analytics for finding valuable insights at the right time using smart city data is crucial in making appropriate decisions for city administration. It is essential to use multiple data sources as input for the analysis to achieve better and more accurate data-driven solutions. It helps in finding more accurate solutions and making appropriate decisions. Public safety is one of the major concerns in any smart city project in which real-time analytics is much useful in the early detection of valuable data patterns. It is crucial to find early predictions of crime-related incidents and generating emergency alerts for making appropriate decisions to provide security to the people and safety of the city infrastructure. This paper discusses the proposed real-time big data analytics framework with data blending approach using multiple data sources for smart city applications. Analytics using multiple data sources for a specific data-driven solution helps in finding more data patterns, which in turn increases the accuracy of analytics results. The data preprocessing phase is a challenging task in data analytics when data being ingested continuously in real-time into the analytics system. The proposed system helps in the preprocessing of real-time data with data blending of multiple data sources used in the analytics. The proposed framework is beneficial when data from multiple sources are ingested in real-time as input data and is also flexible to use any additional data source of interest. The experimental work carried out with the proposed framework using multiple data sources to find the crime-related insights in real-time helps the public safety solutions in the smart city. The experimental outcome shows that there is a significant increase in the number of identified useful data patterns as the number of data sources increases. A real-time based emergency alert system to help the public safety solution is implementedusing a machine learning-based classification algorithm with the proposed framework. The experiment is carried out with different classification algorithms, and the results show that Naive Bayes classification performs better in generating emergency alerts.

Download Full-text

An Efficient Multiple Data Sources Selection Algorithm in Data-Sharing Environments

Journal of Software ◽

10.3724/sp.j.1001.2008.00314 ◽

2008 ◽

Vol 19 (2) ◽

pp. 314-322 ◽

Cited By ~ 1

Author(s):

Xiao-Qing WANG

Keyword(s):

Data Sharing ◽

Data Sources ◽

Selection Algorithm ◽

Multiple Data Sources ◽

Multiple Data

Download Full-text

Integration of statistical and administrative agricultural data from Namibia

Statistical Journal of the IAOS ◽

10.3233/sji-200634 ◽

2021 ◽

pp. 1-22

Author(s):

Emily Berg ◽

Johgho Im ◽

Zhengyuan Zhu ◽

Colin Lewis-Beck ◽

Jie Li

Keyword(s):

Measurement Error ◽

Data Collection ◽

Administrative Data ◽

Statistical Data ◽

Data Sources ◽

Extension Programs ◽

Multiple Data ◽

Administrative Agencies ◽

Crop Area ◽

Using Data

Statistical and administrative agencies often collect information on related parameters. Discrepancies between estimates from distinct data sources can arise due to differences in definitions, reference periods, and data collection protocols. Integrating statistical data with administrative data is appealing for saving data collection costs, reducing respondent burden, and improving the coherence of estimates produced by statistical and administrative agencies. Model based techniques, such as small area estimation and measurement error models, for combining multiple data sources have benefits of transparency, reproducibility, and the ability to provide an estimated uncertainty. Issues associated with integrating statistical data with administrative data are discussed in the context of data from Namibia. The national statistical agency in Namibia produces estimates of crop area using data from probability samples. Simultaneously, the Namibia Ministry of Agriculture, Water, and Forestry obtains crop area estimates through extension programs. We illustrate the use of a structural measurement error model for the purpose of synthesizing the administrative and survey data to form a unified estimate of crop area. Limitations on the available data preclude us from conducting a genuine, thorough application. Nonetheless, our illustration of methodology holds potential use for a general practitioner.

Download Full-text

Examining Deep Learning Models with Multiple Data Sources for COVID-19 Forecasting

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9377904 ◽

2020 ◽

Author(s):

Lijing Wang ◽

Aniruddha Adiga ◽

Srinivasan Venkatramanan ◽

Jiangzhuo Chen ◽

Bryan Lewis ◽

...

Keyword(s):

Deep Learning ◽

Data Sources ◽

Learning Models ◽

Multiple Data Sources ◽

Multiple Data

Download Full-text

Organizing Multiple Data Sources for Developing Intelligent e-Business Portals

Data Mining and Knowledge Discovery ◽

10.1007/s10618-005-0018-2 ◽

2006 ◽

Vol 12 (2-3) ◽

pp. 127-150 ◽

Cited By ~ 18

Author(s):

Jia Hu ◽

Ning Zhong

Keyword(s):

Data Sources ◽

Multiple Data Sources ◽

Multiple Data

Download Full-text

Significance of integration and use of multiple data sources for understanding substance use and mental health disorders

Addiction ◽

10.1111/add.15562 ◽

2021 ◽

Author(s):

Krishnan Radhakrishnan

Keyword(s):

Mental Health ◽

Substance Use ◽

Mental Health Disorders ◽

Data Sources ◽

Multiple Data Sources ◽

Multiple Data ◽

Health Disorders

Download Full-text

Big data and portfolio optimization: A novel approach integrating DEA with multiple data sources

Omega ◽

10.1016/j.omega.2021.102479 ◽

2021 ◽

pp. 102479

Author(s):

Zhongbao Zhou ◽

Meng Gao ◽

Helu Xiao ◽

Rui Wang ◽

Wenbin Liu

Keyword(s):

Big Data ◽

Portfolio Optimization ◽

Data Sources ◽

Multiple Data Sources ◽

Multiple Data ◽

Novel Approach

Download Full-text

A dataset on affiliation of venture capitalists in China between 2000 and 2016

Scientific Data ◽

10.1038/s41597-021-00993-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Jin Chen ◽

Tianyuan Chen ◽

Yifei Song ◽

Bin Hao ◽

Ling Ma

Keyword(s):

Data Sources ◽

Venture Capitalists ◽

High Quality ◽

Public Agency ◽

Multiple Data Sources ◽

Multiple Data ◽

Multi Stage ◽

The World ◽

Prior Literature ◽

Innovation And Entrepreneurship

AbstractPrior literature emphasizes the distinct roles of differently affiliated venture capitalists (VCs) in nurturing innovation and entrepreneurship. Although China has become the second largest VC market in the world, the unavailability of high-quality datasets on VC affiliation in China’s market hinders such research efforts. To fill up this important gap, we compiled a new panel dataset of VC affiliation in China’s market from multiple data sources. Specifically, we drew on a list of 6,553 VCs that have invested in China between 2000 and 2016 from CVSource database, collected VC’s shareholder information from public sources, and developed a multi-stage procedure to label each VC as the following types: GVC (public agency-affiliated, state-owned enterprise-affiliated), CVC (corporate VC), IVC (independent VC), BVC (bank-affiliated VC), FVC (financial/non-bank-affiliated VC), UVC (university endowment/spin-out unit), and PenVC (pension-affiliated VC). We also denoted whether a VC has foreign background. This dataset helps researchers conduct more nuanced investigations into the investment behaviors of different VCs and their distinct impacts on innovation and entrepreneurship in China’s context.

Download Full-text

Combining multiple data sources in species distribution models while accounting for spatial dependence and overfitting with combined penalized likelihood maximization

Methods in Ecology and Evolution ◽

10.1111/2041-210x.13297 ◽

2019 ◽

Vol 10 (12) ◽

pp. 2118-2128

Author(s):

Ian W. Renner ◽

Julie Louvrier ◽

Olivier Gimenez

Keyword(s):

Species Distribution ◽

Spatial Dependence ◽

Species Distribution Models ◽

Penalized Likelihood ◽

Data Sources ◽

Distribution Models ◽

Multiple Data Sources ◽

Multiple Data

Download Full-text

Modeling the Reliability of Complex Systems with Multiple Data Sources: A Case Study on Making Statistical Tools Accessible to Engineers

Quality Engineering ◽

10.1080/08982112.2012.641152 ◽

2012 ◽

Vol 24 (2) ◽

pp. 280-291 ◽

Cited By ~ 6

Author(s):

Christine M. Anderson-Cook ◽

Richard M. Klamann ◽

Jerome Morzinski

Keyword(s):

Complex Systems ◽

Data Sources ◽

Statistical Tools ◽

Multiple Data Sources ◽

Multiple Data

Download Full-text