Demographic Characterization of Anonymous Trace Travel Data

Joshua Auld; Abolfazl (Kouros) Mohammadian; Marcelo Simas Oliveira; Jean Wolf; William Bachman

doi:10.3141/2526-03

Demographic Characterization of Anonymous Trace Travel Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2526-03 ◽

2015 ◽

Vol 2526 (1) ◽

pp. 19-28 ◽

Cited By ~ 2

Author(s):

Joshua Auld ◽

Abolfazl (Kouros) Mohammadian ◽

Marcelo Simas Oliveira ◽

Jean Wolf ◽

William Bachman

Keyword(s):

Test Data ◽

Large Scale ◽

Travel Demand ◽

Null Model ◽

Demographic Characteristics ◽

Travel Pattern ◽

Data Set ◽

Trace Data ◽

Demographic Models ◽

Pattern Information

Research was undertaken to determine whether demographic characteristics of individual travelers could be derived from travel pattern information when no information about the individual was available. This question is relevant in the context of anonymously collected travel information, such as cell phone traces, when used for travel demand modeling. Determining the demographics of a traveler from such data could partially obviate the need for large-scale collection of travel survey data, depending on the purpose for which the data were to be used. This research complements methodologies used to identify activity stops, purposes, and mode types from raw trace data and presumes that such methods exist and are available. The paper documents the development of procedures for taking raw activity streams estimated from GPS trace data and converting these into activity travel pattern characteristics that are then combined with basic land use information and used to estimate various models of demographic characteristics. The work status, education level, age, and license possession of individuals and the presence of children in their households were all estimated successfully with substantial increases in performance versus null model expectations for both training and test data sets. The gender, household size, and number of vehicles proved more difficult to estimate, and performance was lower on the test data set; these aspects indicate overfitting in these models. Overall, the demographic models appear to have potential for characterizing anonymous data streams, which could extend the usability and applicability of such data sources to the travel demand context.

Download Full-text

Impact of Major Road Supply on Individual Travel Time Expenditure: An Exploration with a 30-Year Variation of Infrastructure and Travel

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118791866 ◽

2018 ◽

Vol 2672 (3) ◽

pp. 56-68

Author(s):

Ryosuke Abe ◽

Kay W. Axhausen

Keyword(s):

Travel Time ◽

Large Scale ◽

Travel Demand ◽

Road Traffic ◽

Land Development ◽

Transportation Infrastructure ◽

Major Road ◽

Birth Cohorts ◽

Data Set ◽

The Impact

This study estimates the impact of major road supply on individual travel time expenditures (TTEs) using data that cover 30-year variations in transportation infrastructure and travel behavior. The impacts of the supply of road and rail infrastructure are estimated with a data set that combines records of large-scale household travel surveys in the Tokyo metropolitan area conducted in 1978, 1988, 1998, and 2008. Linear and Tobit models of individual TTEs are estimated by following the behavior of birth cohorts over the 30-year period. The models incorporate the changes in transportation infrastructure, measured as lane kilometers of two levels of major road stock and vehicle kilometers of urban rail service. The results show significant negative effects of lane kilometers for higher-level and lower-level major roads on the TTEs for all travel purposes and for commuting, after controlling for socioeconomic backgrounds and generations of individuals. This study discusses that, in Tokyo, the estimated effect is more likely to reflect the effect of a major road network per se on individual TTEs than the (indirect) effect of major road supply on individual TTEs working through land development activities (i.e., induced car travel demand). For example, the caveat is that actual road investment decisions still need to consider the induced component of road traffic in addition to the (direct) effect that is estimated in this study.

Download Full-text

Large-scale test data set for location problems

Data in Brief ◽

10.1016/j.dib.2018.01.008 ◽

2018 ◽

Vol 17 ◽

pp. 267-274 ◽

Cited By ~ 2

Author(s):

Matej Cebecauer ◽

Ľuboš Buzna

Keyword(s):

Test Data ◽

Large Scale ◽

Location Problems ◽

Data Set ◽

Scale Test

Download Full-text

Revealing Spatial-Temporal Characteristics and Patterns of Urban Travel: A Large-Scale Analysis and Visualization Study with Taxi GPS Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8060257 ◽

2019 ◽

Vol 8 (6) ◽

pp. 257 ◽

Cited By ~ 8

Author(s):

Huihui Wang ◽

Hong Huang ◽

Xiaoyong Ni ◽

Weihua Zeng

Keyword(s):

Large Scale ◽

Travel Demand ◽

Chord Diagram ◽

Future Research ◽

Trajectory Data ◽

Temporal Scales ◽

Data Set ◽

Ring Road ◽

Temporal Characteristics ◽

Taxi Trajectory

Mobility and spatial interaction data have become increasingly available due to the widespread adoption of location-aware technologies. Examples of mobile data include human daily activities, vehicle trajectories, and animal movements. In this study we focus on a special type of mobility data, i.e., origin–destination (OD) pairs, and propose a new adapted chord diagram plot to reveal the urban human travel spatial-temporal characteristics and patterns of a seven-day taxi trajectory data set collected in Beijing; this large scale data set includes approximately 88.5 million trips of anonymous customers. The spatial distribution patterns of the pick-up points (PUPs) and the drop-off points (DOPs) on weekdays and weekends are analyzed first. The maximum of the morning and the evening peaks are at 8:00–10:00 and 17:00–19:00. The morning peaks of taxis are delayed by 0.5–1 h compared with the commuting morning peaks. Second, travel demand, intensity, time, and distance on weekdays and weekends are analyzed to explore human mobility. The travel demand and high-intensity travel of residents in Beijing is mainly concentrated within the 6th Ring Road. The residents who travel long distances (>10 km) and for a long time (>60 min) mainly from outside the 6th Ring Road and the surrounding new towns of Beijing. The circular structure of the travel distance distribution also confirms the single-center urban structure of Beijing. Finally, a new adapted chord diagram plot is proposed to achieve the spatial-temporal scale visualization of taxi trajectory origin–destination (OD) flows. The method can characterize the volume, direction, and properties of OD flows in multiple spatial-temporal scales; it is implemented using a circular visualization package in R (circlize). Through the visualization experiment of taxi GPS trajectory data in Beijing, the results show that the proposed visualization technology is able to characterize the spatial-temporal patterns of trajectory OD flows in multiple spatial-temporal scales. These results are expected to enhance current urban mobility research and suggest some interesting avenues for future research.

Download Full-text

ProGen:Provenance database generator for large-scale data set

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.02737 ◽

2009 ◽

Vol 28 (11) ◽

pp. 2737-2740

Author(s):

Xiao ZHANG ◽

Shan WANG ◽

Na LIAN

Keyword(s):

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

Financial distress determinants among SMEs: empirical evidence from Sweden

Journal of Economic Studies ◽

10.1108/jes-01-2019-0030 ◽

2020 ◽

Vol 47 (3) ◽

pp. 547-560 ◽

Cited By ~ 1

Author(s):

Darush Yazdanfar ◽

Peter Öhman

Keyword(s):

Financial Crisis ◽

Financial Distress ◽

Large Scale ◽

Global Financial Crisis ◽

Binary Logistic Regression ◽

Data Availability ◽

Cross Sectional ◽

Data Set ◽

Content Type ◽

The Global Financial Crisis

PurposeThe purpose of this study is to empirically investigate determinants of financial distress among small and medium-sized enterprises (SMEs) during the global financial crisis and post-crisis periods.Design/methodology/approachSeveral statistical methods, including multiple binary logistic regression, were used to analyse a longitudinal cross-sectional panel data set of 3,865 Swedish SMEs operating in five industries over the 2008–2015 period.FindingsThe results suggest that financial distress is influenced by macroeconomic conditions (i.e. the global financial crisis) and, in particular, by various firm-specific characteristics (i.e. performance, financial leverage and financial distress in previous year). However, firm size and industry affiliation have no significant relationship with financial distress.Research limitationsDue to data availability, this study is limited to a sample of Swedish SMEs in five industries covering eight years. Further research could examine the generalizability of these findings by investigating other firms operating in other industries and other countries.Originality/valueThis study is the first to examine determinants of financial distress among SMEs operating in Sweden using data from a large-scale longitudinal cross-sectional database.

Download Full-text

How to Assess Prognostic Models for Survival Data: A Case Study in Oncology

Methods of Information in Medicine ◽

10.1055/s-0038-1634384 ◽

2003 ◽

Vol 42 (05) ◽

pp. 564-571 ◽

Cited By ~ 23

Author(s):

M. Schumacher ◽

E. Graf ◽

T. Gerds

Keyword(s):

Test Data ◽

Survival Data ◽

Prediction Error ◽

Classification Scheme ◽

Neural Nets ◽

Brier Score ◽

Data Set ◽

Independent Test ◽

Artificial Neural

Summary Objectives: A lack of generally applicable tools for the assessment of predictions for survival data has to be recognized. Prediction error curves based on the Brier score that have been suggested as a sensible approach are illustrated by means of a case study. Methods: The concept of predictions made in terms of conditional survival probabilities given the patient’s covariates is introduced. Such predictions are derived from various statistical models for survival data including artificial neural networks. The idea of how the prediction error of a prognostic classification scheme can be followed over time is illustrated with the data of two studies on the prognosis of node positive breast cancer patients, one of them serving as an independent test data set. Results and Conclusions: The Brier score as a function of time is shown to be a valuable tool for assessing the predictive performance of prognostic classification schemes for survival data incorporating censored observations. Comparison with the prediction based on the pooled Kaplan Meier estimator yields a benchmark value for any classification scheme incorporating patient’s covariate measurements. The problem of an overoptimistic assessment of prediction error caused by data-driven modelling as it is, for example, done with artificial neural nets can be circumvented by an assessment in an independent test data set.

Download Full-text

Relationship between large-scale ionospheric field-aligned currents and electron/ion precipitations: DMSP observations

Earth Planets and Space ◽

10.1186/s40623-020-01286-z ◽

2020 ◽

Vol 72 (1) ◽

Author(s):

Chao Xiong ◽

Claudia Stolle ◽

Patrick Alken ◽

Jan Rauberg

Keyword(s):

Time Distribution ◽

Large Scale ◽

Particle Flux ◽

Particle Energy ◽

Lower Latitude ◽

Particle Precipitation ◽

Data Set ◽

The Mean ◽

Two Parameters ◽

Meteorological Satellite

Abstract In this study, we have derived field-aligned currents (FACs) from magnetometers onboard the Defense Meteorological Satellite Project (DMSP) satellites. The magnetic latitude versus local time distribution of FACs from DMSP shows comparable dependences with previous findings on the intensity and orientation of interplanetary magnetic field (IMF) By and Bz components, which confirms the reliability of DMSP FAC data set. With simultaneous measurements of precipitating particles from DMSP, we further investigate the relation between large-scale FACs and precipitating particles. Our result shows that precipitation electron and ion fluxes both increase in magnitude and extend to lower latitude for enhanced southward IMF Bz, which is similar to the behavior of FACs. Under weak northward and southward Bz conditions, the locations of the R2 current maxima, at both dusk and dawn sides and in both hemispheres, are found to be close to the maxima of the particle energy fluxes; while for the same IMF conditions, R1 currents are displaced further to the respective particle flux peaks. Largest displacement (about 3.5°) is found between the downward R1 current and ion flux peak at the dawn side. Our results suggest that there exists systematic differences in locations of electron/ion precipitation and large-scale upward/downward FACs. As outlined by the statistical mean of these two parameters, the FAC peaks enclose the particle energy flux peaks in an auroral band at both dusk and dawn sides. Our comparisons also found that particle precipitation at dawn and dusk and in both hemispheres maximizes near the mean R2 current peaks. The particle precipitation flux maxima closer to the R1 current peaks are lower in magnitude. This is opposite to the known feature that R1 currents are on average stronger than R2 currents.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

IEEE Transactions on Computational Social Systems ◽

10.1109/tcss.2021.3051189 ◽

2021 ◽

pp. 1-13

Author(s):

Usman Naseem ◽

Imran Razzak ◽

Matloob Khushi ◽

Peter W. Eklund ◽

Jinman Kim

Keyword(s):

Sentiment Analysis ◽

Large Scale ◽

Data Set ◽

Twitter Data

Download Full-text