Real-Time Piecewise Regression

Author(s):  
Yang Zhang ◽  
Yuandong Liu ◽  
Lee D. Han

Ubiquitous sensing technologies make big data a trendy topic and a favored approach in transportation studies and applications, but the increasing volumes of data sets present remarkable challenges to data collection, storage, transfer, visualization, and processing. Fundamental aspects of big data in transportation are discussed, including how many data to collect and how to collect data effectively and economically. The focus is GPS trajectory data, which are used widely in this domain. An incremental piecewise regression algorithm is used to evaluate and compress GPS locations as they are produced. Row-wise QR decomposition and singular value decomposition are shown to be valid numerical algorithms for incremental regression. Sliding window–based piecewise regression can subsample the GPS streaming data instantaneously to preserve only the points of interest. Algorithm performance is evaluated completely as accuracy and compression power. A procedure is presented for users to choose the best parameter value for their GPS devices. Results of experiments with real-world trajectory data indicate that when the proper parameter value is selected, the proposed method achieves significant compression power (more than 10 times), maintains acceptable accuracy (less than 5 m), and always outperforms the fixed-rate sampling approach.

Author(s):  
Adham Kalila ◽  
Zeyad Awwad ◽  
Riccardo Di Clemente ◽  
Marta C. González

Falling oil revenues and rapid urbanization are putting a strain on the budgets of oil-producing nations, which often subsidize domestic fuel consumption. A direct way to decrease the impact of subsidies is to reduce fuel consumption by reducing congestion and car trips. As fuel consumption models have started to incorporate data sources from ubiquitous sensing devices, the opportunity is to develop comprehensive models at urban scale leveraging sources such as Global Positioning System (GPS) data and Call Detail Records. This paper combines these big data sets in a novel method to model fuel consumption within a city and estimate how it may change in different scenarios. To do so a fuel consumption model was calibrated for use on any car fleet fuel economy distribution and applied in Riyadh, Saudi Arabia. The model proposed, based on speed profiles, was then used to test the effects on fuel consumption of reducing flow, both randomly and by targeting the most fuel-inefficient trips in the city. The estimates considerably improve baseline methods based on average speeds, showing the benefits of the information added by the GPS data fusion. The presented method can be adapted to also measure emissions. The results constitute a clear application of data analysis tools to help decision makers compare policies aimed at achieving economic and environmental goals.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Zhihan Liu ◽  
Yi Jia ◽  
Xiaolu Zhu

Car sharing is a type of car rental service, by which consumers rent cars for short periods of time, often charged by hours. The analysis of urban traffic big data is full of importance and significance to determine locations of depots for car-sharing system. Taxi OD (Origin-Destination) is a typical dataset of urban traffic. The volume of the data is extremely large so that traditional data processing applications do not work well. In this paper, an optimization method to determine the depot locations by clustering taxi OD points with AP (Affinity Propagation) clustering algorithm has been presented. By analyzing the characteristics of AP clustering algorithm, AP clustering has been optimized hierarchically based on administrative region segmentation. Considering sparse similarity matrix of taxi OD points, the input parameters of AP clustering have been adapted. In the case study, we choose the OD pairs information from Beijing’s taxi GPS trajectory data. The number and locations of depots are determined by clustering the OD points based on the optimization AP clustering. We describe experimental results of our approach and compare it with standard K-means method using quantitative and stationarity index. Experiments on the real datasets show that the proposed method for determining car-sharing depots has a superior performance.


Informatica ◽  
2019 ◽  
Vol 30 (1) ◽  
pp. 33-52 ◽  
Author(s):  
Pengfei HAO ◽  
Chunlong YAO ◽  
Qingbin MENG ◽  
Xiaoqiang YU ◽  
Xu LI

2014 ◽  
Author(s):  
Pankaj K. Agarwal ◽  
Thomas Moelhave
Keyword(s):  
Big Data ◽  

2020 ◽  
Vol 13 (4) ◽  
pp. 790-797
Author(s):  
Gurjit Singh Bhathal ◽  
Amardeep Singh Dhiman

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.


2021 ◽  
Author(s):  
Chao Chen ◽  
Daqing Zhang ◽  
Yasha Wang ◽  
Hongyu Huang

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2020 ◽  
Vol 70 (1) ◽  
pp. 145-161 ◽  
Author(s):  
Marnus Stoltz ◽  
Boris Baeumer ◽  
Remco Bouckaert ◽  
Colin Fox ◽  
Gordon Hiscott ◽  
...  

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]


Sign in / Sign up

Export Citation Format

Share Document