Supporting large scale connected vehicle data analysis using HIVE

Mining Connected Vehicle Data for Beneficial Patterns in Dubai Taxi Operations

Journal of Advanced Transportation ◽

10.1155/2018/8963234 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Raj Bridgelall ◽

Pan Lu ◽

Denver D. Tolliver ◽

Tai Xu

Keyword(s):

Large Scale ◽

Statistical Tests ◽

Poor Quality ◽

Connected Vehicle ◽

Cleaning Method ◽

Mobility Data ◽

Automated Method ◽

Vehicle Data ◽

Chi Squared ◽

Trip Production

On-demand shared mobility services such as Uber and microtransit are steadily penetrating the worldwide market for traditional dispatched taxi services. Hence, taxi companies are seeking ways to compete. This study mined large-scale mobility data from connected taxis to discover beneficial patterns that may inform strategies to improve dispatch taxi business. It is not practical to manually clean and filter large-scale mobility data that contains GPS information. Therefore, this research contributes and demonstrates an automated method of data cleaning and filtering that is suitable for such types of datasets. The cleaning method defines three filter variables and applies a layered statistical filtering technique to eliminate outlier records that do not contribute to distributions that match expected theoretical distributions of the variables. Chi-squared statistical tests evaluate the quality of the cleaned data by comparing the distribution of the three variables with their expected distributions. The overall cleaning method removed approximately 5% of the data, which consisted of errors that were obvious and others that were poor quality outliers. Subsequently, mining the cleaned data revealed that trip production in Dubai peaks for the case when only the same two drivers operate the same taxi. This finding would not have been possible without access to proprietary data that contains unique identifiers for both drivers and taxis. Datasets that identify individual drivers are not publicly available.

Download Full-text

Emerging geo-data sources to reveal human mobility dynamics during COVID-19 pandemic: opportunities and challenges

Computational Urban Science ◽

10.1007/s43762-021-00022-x ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Xiao Li ◽

Haowen Xu ◽

Xiao Huang ◽

Chenxiao Guo ◽

Yuhao Kang ◽

...

Keyword(s):

Data Privacy ◽

Large Scale ◽

Human Mobility ◽

Data Sources ◽

Data Uncertainty ◽

Connected Vehicle ◽

Research Experience ◽

Mobility Data ◽

Vehicle Data ◽

Spatial Coverage

AbstractEffectively monitoring the dynamics of human mobility is of great importance in urban management, especially during the COVID-19 pandemic. Traditionally, the human mobility data is collected by roadside sensors, which have limited spatial coverage and are insufficient in large-scale studies. With the maturing of mobile sensing and Internet of Things (IoT) technologies, various crowdsourced data sources are emerging, paving the way for monitoring and characterizing human mobility during the pandemic. This paper presents the authors’ opinions on three types of emerging mobility data sources, including mobile device data, social media data, and connected vehicle data. We first introduce each data source’s main features and summarize their current applications within the context of tracking mobility dynamics during the COVID-19 pandemic. Then, we discuss the challenges associated with using these data sources. Based on the authors’ research experience, we argue that data uncertainty, big data processing problems, data privacy, and theory-guided data analytics are the most common challenges in using these emerging mobility data sources. Last, we share experiences and opinions on potential solutions to address these challenges and possible research directions associated with acquiring, discovering, managing, and analyzing big mobility data.

Download Full-text

Review of Usage of Real-World Connected Vehicle Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198120940996 ◽

2020 ◽

Vol 2674 (10) ◽

pp. 939-950

Author(s):

Yun Zhou ◽

Raj Bridgelall

Keyword(s):

Real World ◽

Large Scale ◽

The United States ◽

Connected Vehicles ◽

United States Department ◽

Connected Vehicle ◽

Pilot Model ◽

Vehicle Data ◽

World Information ◽

Vehicle Technologies

GPS loggers and cameras aboard connected vehicles can produce vast amounts of data. Analysts can mine such data to decipher patterns in vehicle trajectories and driver–vehicle interactions. Ability to process such large-scale data in real time can inform strategies to reduce crashes, improve traffic flow, enhance system operational efficiencies, and reduce environmental impacts. However, connected vehicle technologies are in the very early phases of deployment. Therefore, related datasets are extremely scarce, and the utility of such emerging datasets is largely unknown. This paper provides a comprehensive review of studies that used large-scale connected vehicle data from the United States Department of Transportation Connected Vehicle Safety Pilot Model Deployment program. It is the first and only such dataset available to the public. The data contains real-world information about the operation of connected vehicles that organizations are testing. The paper provides a summary of the available datasets and their organization, and the overall structure and other characteristics of the data captured during pilot deployments. Usage of the data is then classified into three categories: driving pattern identification, development of surrogate safety measures, and improvements in the operation of signalized intersections. Finally, some limitations experienced with the existing datasets are identified.

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

Cyberstalking Victimization Model Using Criminological Theory: A Systematic Literature Review, Taxonomies, Applications, Tools, and Validations

Electronics ◽

10.3390/electronics10141670 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1670

Author(s):

Waheeb Abu-Ulbeh ◽

Maryam Altalhi ◽

Laith Abualigah ◽

Abdulwahab Ali Almazroi ◽

Putra Sumari ◽

...

Keyword(s):

Data Analysis ◽

Structural Equation ◽

Large Scale ◽

Review Paper ◽

Essential Element ◽

Routine Activities ◽

Criminological Theory ◽

Equation Modeling ◽

Future Research ◽

Proposed Model

Cyberstalking is a growing anti-social problem being transformed on a large scale and in various forms. Cyberstalking detection has become increasingly popular in recent years and has technically been investigated by many researchers. However, cyberstalking victimization, an essential part of cyberstalking, has empirically received less attention from the paper community. This paper attempts to address this gap and develop a model to understand and estimate the prevalence of cyberstalking victimization. The model of this paper is produced using routine activities and lifestyle exposure theories and includes eight hypotheses. The data of this paper is collected from the 757 respondents in Jordanian universities. This review paper utilizes a quantitative approach and uses structural equation modeling for data analysis. The results revealed a modest prevalence range is more dependent on the cyberstalking type. The results also indicated that proximity to motivated offenders, suitable targets, and digital guardians significantly influences cyberstalking victimization. The outcome from moderation hypothesis testing demonstrated that age and residence have a significant effect on cyberstalking victimization. The proposed model is an essential element for assessing cyberstalking victimization among societies, which provides a valuable understanding of the prevalence of cyberstalking victimization. This can assist the researchers and practitioners for future research in the context of cyberstalking victimization.

Download Full-text

Machine learning based real-time vehicle data analysis for safe driving modeling

Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing - SAC '19 ◽

10.1145/3297280.3297584 ◽

2019 ◽

Cited By ~ 2

Author(s):

Pamul Yadav ◽

Sangsu Jung ◽

Dhananjay Singh

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Real Time ◽

Safe Driving ◽

Vehicle Data

Download Full-text

Microcomputers in Political Science

News for Teachers of Political Science ◽

10.1017/s0197901900005079 ◽

1983 ◽

Vol 38 ◽

pp. 1-9

Author(s):

Herbert F. Weisberg

Keyword(s):

Data Analysis ◽

Political Science ◽

Large Scale ◽

Turnaround Time ◽

General Purpose ◽

Batch Mode ◽

New Era ◽

Large Scale Data ◽

The Social ◽

Frequency Counts

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.

Download Full-text

When didactics meet data science: process data analysis in large-scale mathematics assessment in France

Large-scale Assessments in Education ◽

10.1186/s40536-020-00085-y ◽

2020 ◽

Vol 8 (1) ◽

Author(s):

Franck Salles ◽

Reinaldo Dos Santos ◽

Saskia Keskpaik

Keyword(s):

Data Analysis ◽

Large Scale ◽

Data Science ◽

Mathematics Assessment ◽

Process Data ◽

Meet Data

Download Full-text

The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young

mSphere ◽

10.1128/msphere.00327-17 ◽

2017 ◽

Vol 2 (5) ◽

Cited By ~ 65

Author(s):

Gaorui Bian ◽

Gregory B. Gloor ◽

Aihua Gong ◽

Changsheng Jia ◽

Wei Zhang ◽

...

Keyword(s):

Data Analysis ◽

Gut Microbiota ◽

Large Scale ◽

Compositional Data ◽

Healthy Lifestyle ◽

Compositional Data Analysis ◽

Surprising Result ◽

Microbiota Composition ◽

Cross Sectional ◽

Age Cohorts

ABSTRACT We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations. The microbiota of the aged is variously described as being more or less diverse than that of younger cohorts, but the comparison groups used and the definitions of the aged population differ between experiments. The differences are often described by null hypothesis statistical tests, which are notoriously irreproducible when dealing with large multivariate samples. We collected and examined the gut microbiota of a cross-sectional cohort of more than 1,000 very healthy Chinese individuals who spanned ages from 3 to over 100 years. The analysis of 16S rRNA gene sequencing results used a compositional data analysis paradigm coupled with measures of effect size, where ordination, differential abundance, and correlation can be explored and analyzed in a unified and reproducible framework. Our analysis showed several surprising results compared to other cohorts. First, the overall microbiota composition of the healthy aged group was similar to that of people decades younger. Second, the major differences between groups in the gut microbiota profiles were found before age 20. Third, the gut microbiota differed little between individuals from the ages of 30 to >100. Fourth, the gut microbiota of males appeared to be more variable than that of females. Taken together, the present findings suggest that the microbiota of the healthy aged in this cross-sectional study differ little from that of the healthy young in the same population, although the minor variations that do exist depend upon the comparison cohort. IMPORTANCE We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations.

Download Full-text