Real-Time Piecewise Regression

Yang Zhang; Yuandong Liu; Lee D. Han

doi:10.3141/2643-02

Real-Time Piecewise Regression

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2643-02 ◽

2017 ◽

Vol 2643 (1) ◽

pp. 9-18

Author(s):

Yang Zhang ◽

Yuandong Liu ◽

Lee D. Han

Keyword(s):

Big Data ◽

Numerical Algorithms ◽

Streaming Data ◽

Data Sets ◽

Trajectory Data ◽

Fixed Rate ◽

Piecewise Regression ◽

Ubiquitous Sensing ◽

Gps Trajectory Data ◽

Value Decomposition

Ubiquitous sensing technologies make big data a trendy topic and a favored approach in transportation studies and applications, but the increasing volumes of data sets present remarkable challenges to data collection, storage, transfer, visualization, and processing. Fundamental aspects of big data in transportation are discussed, including how many data to collect and how to collect data effectively and economically. The focus is GPS trajectory data, which are used widely in this domain. An incremental piecewise regression algorithm is used to evaluate and compress GPS locations as they are produced. Row-wise QR decomposition and singular value decomposition are shown to be valid numerical algorithms for incremental regression. Sliding window–based piecewise regression can subsample the GPS streaming data instantaneously to preserve only the points of interest. Algorithm performance is evaluated completely as accuracy and compression power. A procedure is presented for users to choose the best parameter value for their GPS devices. Results of experiments with real-world trajectory data indicate that when the proper parameter value is selected, the proposed method achieves significant compression power (more than 10 times), maintains acceptable accuracy (less than 5 m), and always outperforms the fixed-rate sampling approach.

Download Full-text

The GPS trajectory data research based on the intelligent traffic big data analysis platform

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-170728 ◽

2017 ◽

Vol 17 (3) ◽

pp. 423-430 ◽

Cited By ~ 3

Author(s):

Xijun Zhang ◽

Zhanting Yuan

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Trajectory Data ◽

Gps Trajectory Data ◽

Gps Trajectory ◽

Analysis Platform

Download Full-text

Big Data Fusion to Estimate Urban Fuel Consumption: A Case Study of Riyadh

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118798461 ◽

2018 ◽

Vol 2672 (24) ◽

pp. 49-59

Author(s):

Adham Kalila ◽

Zeyad Awwad ◽

Riccardo Di Clemente ◽

Marta C. González

Keyword(s):

Big Data ◽

Data Fusion ◽

Fuel Consumption ◽

Gps Data ◽

Data Sets ◽

Rapid Urbanization ◽

Ubiquitous Sensing ◽

Environmental Goals ◽

Oil Revenues ◽

The Impact

Falling oil revenues and rapid urbanization are putting a strain on the budgets of oil-producing nations, which often subsidize domestic fuel consumption. A direct way to decrease the impact of subsidies is to reduce fuel consumption by reducing congestion and car trips. As fuel consumption models have started to incorporate data sources from ubiquitous sensing devices, the opportunity is to develop comprehensive models at urban scale leveraging sources such as Global Positioning System (GPS) data and Call Detail Records. This paper combines these big data sets in a novel method to model fuel consumption within a city and estimate how it may change in different scenarios. To do so a fuel consumption model was calibrated for use on any car fleet fuel economy distribution and applied in Riyadh, Saudi Arabia. The model proposed, based on speed profiles, was then used to test the effects on fuel consumption of reducing flow, both randomly and by targeting the most fuel-inefficient trips in the city. The estimates considerably improve baseline methods based on average speeds, showing the benefits of the information added by the GPS data fusion. The presented method can be adapted to also measure emissions. The results constitute a clear application of data analysis tools to help decision makers compare policies aimed at achieving economic and environmental goals.

Download Full-text

Deployment Strategy for Car-Sharing Depots by Clustering Urban Traffic Big Data Based on Affinity Propagation

Scientific Programming ◽

10.1155/2018/3907513 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Zhihan Liu ◽

Yi Jia ◽

Xiaolu Zhu

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Optimization Method ◽

Urban Traffic ◽

Affinity Propagation ◽

Superior Performance ◽

Trajectory Data ◽

Car Sharing ◽

Gps Trajectory Data ◽

Ap Clustering

Car sharing is a type of car rental service, by which consumers rent cars for short periods of time, often charged by hours. The analysis of urban traffic big data is full of importance and significance to determine locations of depots for car-sharing system. Taxi OD (Origin-Destination) is a typical dataset of urban traffic. The volume of the data is extremely large so that traditional data processing applications do not work well. In this paper, an optimization method to determine the depot locations by clustering taxi OD points with AP (Affinity Propagation) clustering algorithm has been presented. By analyzing the characteristics of AP clustering algorithm, AP clustering has been optimized hierarchically based on administrative region segmentation. Considering sparse similarity matrix of taxi OD points, the input parameters of AP clustering have been adapted. In the case study, we choose the OD pairs information from Beijing’s taxi GPS trajectory data. The number and locations of depots are determined by clustering the OD points based on the optimization AP clustering. We describe experimental results of our approach and compare it with standard K-means method using quantitative and stationarity index. Experiments on the real datasets show that the proposed method for determining car-sharing depots has a superior performance.

Download Full-text

Automatic Generation and Validation of Road Maps from GPS Trajectory Data Sets

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16 ◽

10.1145/2983323.2983797 ◽

2016 ◽

Cited By ~ 7

Author(s):

Hengfeng Li ◽

Lars Kulik ◽

Kotagiri Ramamohanarao

Keyword(s):

Automatic Generation ◽

Data Sets ◽

Trajectory Data ◽

Gps Trajectory Data ◽

Road Maps ◽

Gps Trajectory

Download Full-text

A Heading Maintaining Oriented Compression Algorithm for GPS Trajectory Data

Informatica ◽

10.15388/informatica.2018.196 ◽

2019 ◽

Vol 30 (1) ◽

pp. 33-52 ◽

Cited By ~ 1

Author(s):

Pengfei HAO ◽

Chunlong YAO ◽

Qingbin MENG ◽

Xiaoqiang YU ◽

Xu LI

Keyword(s):

Compression Algorithm ◽

Trajectory Data ◽

Gps Trajectory Data ◽

Gps Trajectory

Download Full-text

Construction of 3-D Terrain Models from BIG Data Sets

10.21236/ada607383 ◽

2014 ◽

Author(s):

Pankaj K. Agarwal ◽

Thomas Moelhave

Keyword(s):

Big Data ◽

Data Sets ◽

Terrain Models

Download Full-text

Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190822095422 ◽

2020 ◽

Vol 13 (4) ◽

pp. 790-797

Author(s):

Gurjit Singh Bhathal ◽

Amardeep Singh Dhiman

Keyword(s):

Big Data ◽

Data Security ◽

Data Sets ◽

Security Framework ◽

Hadoop Distributed File System ◽

Current Scenario ◽

Hadoop Cluster ◽

Ciphertext Policy ◽

In Transit ◽

Hadoop Framework

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.

Download Full-text

Enabling Smart Urban Services with GPS Trajectory Data

10.1007/978-981-16-0178-1 ◽

2021 ◽

Author(s):

Chao Chen ◽

Daqing Zhang ◽

Yasha Wang ◽

Hongyu Huang

Keyword(s):

Trajectory Data ◽

Urban Services ◽

Gps Trajectory Data ◽

Gps Trajectory

Download Full-text

DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

Journal Of Big Data ◽

10.1186/s40537-021-00437-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

Bayesian Inference of Species Trees using Diffusion Models

Systematic Biology ◽

10.1093/sysbio/syaa051 ◽

2020 ◽

Vol 70 (1) ◽

pp. 145-161 ◽

Cited By ~ 1

Author(s):

Marnus Stoltz ◽

Boris Baeumer ◽

Remco Bouckaert ◽

Colin Fox ◽

Gordon Hiscott ◽

...

Keyword(s):

Bayesian Inference ◽

Numerical Algorithms ◽

Diffusion Models ◽

Model Parameters ◽

Data Sets ◽

Species Trees ◽

Computationally Efficient ◽

Data Set ◽

Snp Data ◽

Binary Markers

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]

Download Full-text