scholarly journals A Cloud-Based Framework for Large-Scale Log Mining through Apache Spark and Elasticsearch

2019 ◽  
Vol 9 (6) ◽  
pp. 1114 ◽  
Author(s):  
Yun Li ◽  
Yongyao Jiang ◽  
Juan Gu ◽  
Mingyue Lu ◽  
Manzhu Yu ◽  
...  

The volume, variety, and velocity of different data, e.g., simulation data, observation data, and social media data, are growing ever faster, posing grand challenges for data discovery. An increasing trend in data discovery is to mine hidden relationships among users and metadata from the web usage logs to support the data discovery process. Web usage log mining is the process of reconstructing sessions from raw logs and finding interesting patterns or implicit linkages. The mining results play an important role in improving quality of search-related components, e.g., ranking, query suggestion, and recommendation. While researches were done in the data discovery domain, collecting and analyzing logs efficiently remains a challenge because (1) the volume of web usage logs continues to grow as long as users access the data; (2) the dynamic volume of logs requires on-demand computing resources for mining tasks; (3) the mining process is compute-intensive and time-intensive. To speed up the mining process, we propose a cloud-based log-mining framework using Apache Spark and Elasticsearch. In addition, a data partition paradigm, logPartitioner, is designed to solve the data imbalance problem in data parallelism. As a proof of concept, oceanographic data search and access logs are chosen to validate performance of the proposed parallel log-mining framework.

Universe ◽  
2021 ◽  
Vol 7 (7) ◽  
pp. 220
Author(s):  
Emil Khalikov

The intrinsic spectra of some distant blazars known as “extreme TeV blazars” have shown a hint at an anomalous hardening in the TeV energy region. Several extragalactic propagation models have been proposed to explain this possible excess transparency of the Universe to gamma-rays starting from a model which assumes the existence of so-called axion-like particles (ALPs) and the new process of gamma-ALP oscillations. Alternative models suppose that some of the observable gamma-rays are produced in the intergalactic cascades. This work focuses on investigating the spectral and angular features of one of the cascade models, the Intergalactic Hadronic Cascade Model (IHCM) in the contemporary astrophysical models of Extragalactic Magnetic Field (EGMF). For IHCM, EGMF largely determines the deflection of primary cosmic rays and electrons of intergalactic cascades and, thus, is of vital importance. Contemporary Hackstein models are considered in this paper and compared to the model of Dolag. The models assumed are based on simulations of the local part of large-scale structure of the Universe and differ in the assumptions for the seed field. This work provides spectral energy distributions (SEDs) and angular extensions of two extreme TeV blazars, 1ES 0229+200 and 1ES 0414+009. It is demonstrated that observable SEDs inside a typical point spread function of imaging atmospheric Cherenkov telescopes (IACTs) for IHCM would exhibit a characteristic high-energy attenuation compared to the ones obtained in hadronic models that do not consider EGMF, which makes it possible to distinguish among these models. At the same time, the spectra for IHCM models would have longer high energy tails than some available spectra for the ALP models and the universal spectra for the Electromagnetic Cascade Model (ECM). The analysis of the IHCM observable angular extensions shows that the sources would likely be identified by most IACTs not as point sources but rather as extended ones. These spectra could later be compared with future observation data of such instruments as Cherenkov Telescope Array (CTA) and LHAASO.


Author(s):  
Ruiyang Song ◽  
Kuang Xu

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.


Land ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 792
Author(s):  
Shukun Wang ◽  
Dengwang Li ◽  
Tingting Li ◽  
Changquan Liu

Land fragmentation (LF) is widespread worldwide and affects farmers’ decision-making and, thus, farm performance. We used detailed household survey data at the crop level from ten provinces in China to construct four LF indicators and six farm performance indicators. We ran a set of regression models using OLS methods to analyse the relationship between LF and farm performance. The results showed that (1) LF increased the input of production material and labour costs; (2) LF reduced farmers’ purchasing of mechanical services and the efficiency of ploughing; and (3) LF may increase technical efficiency (this result, however, was not sufficiently robust and had no effect on yield). Generally speaking, LF was negatively related to farm performance. To improve farm performance, it is recommended that decision-makers speed up land transfer and land consolidation, stabilise land property rights, establish land-transfer intermediary organisations and promote large-scale production.


2018 ◽  
Vol 7 (12) ◽  
pp. 472 ◽  
Author(s):  
Bo Wan ◽  
Lin Yang ◽  
Shunping Zhou ◽  
Run Wang ◽  
Dezhi Wang ◽  
...  

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.


2011 ◽  
Vol 12 (1) ◽  
pp. 27-44 ◽  
Author(s):  
Michael Kunz

Abstract Simulations of orographic precipitation over the low mountain ranges of southwestern Germany and eastern France with two different physics-based linear precipitation models are presented. Both models are based on 3D airflow dynamics from linear theory and consider advection of condensed water and leeside drying. Sensitivity studies for idealized conditions and a real case study show that the amount and spatial distribution of orographic precipitation is strongly controlled by characteristic time scales for cloud and hydrometeor advection and background precipitation due to large-scale lifting. These parameters are estimated by adjusting the model results on a 2.5-km grid to observed precipitation patterns for a sample of 40 representative orography-dominated stratiform events (24 h) during a calibration period (1971–80). In general, the best results in terms of lowest rmse and bias are obtained for characteristic time scales of 1600 s and background precipitation of 0.4 mm h−1. Model simulations of a sample of 84 events during an application period (1981–2000) with fixed parameters demonstrate that both models are able to reproduce quantitatively precipitation patterns obtained from observations and reanalyses from a numerical model [Consortium for Small-scale Modeling (COSMO)]. Combining model results with observation data shows that heavy precipitations over mountains are restricted to situations with strong atmospheric forcings in terms of synoptic-scale lifting, horizontal wind speed, and moisture content.


2018 ◽  
Vol 8 (1) ◽  
pp. 18
Author(s):  
Kees Bourgonje ◽  
Hubert J. Veringa ◽  
David M.J. Smeulders ◽  
Jeroen A. van Oijen

To speed up the torrefaction process in traditional torrefaction reactors, in particular auger reactors, the temperature of the reactor is substantially higher than the required torrefaction process temperature. This is due to the low heat conductivity of biomass. Unfortunately, the off-gas characteristics of biomass are very sensitive in the temperature window of 180-300°C which can cause a thermal runaway situation in which the process temperature exceeds the intended level. Due to this very sensitive temperature dependence of biomass pyrolysis and its accompanying gas production, a potential solution is to inject small amounts of air directly into the torrefaction reactor. It is found experimentally that this air injection can regulate the temperature of the biomass very rapidly compared to traditional temperature regulation by changing the reactor wall temperature. With this new torrefaction temperature control method, thermal runaway situations can be avoided and the temperature of the biomass in the reactor can be regulated better. Experiments with large beech wood samples show that the torrefaction reaction rate and the temperature in the core of the sample depend on the amount of injected air. Since the flow of combustible gasses (torr-gas) originating from the torrefaction process is very sensitive to temperature, the heat production by combusting the torr-gas can be controlled to some extent. This will result in both a more homogeneous torrefied product as well as a more stable processing of varying biomass types in large-scale torrefaction systems.


2018 ◽  
Vol 16 (06) ◽  
pp. 1850052
Author(s):  
Y. H. Lee ◽  
M. Khalil-Hani ◽  
M. N. Marsono

While physical realization of practical large-scale quantum computers is still ongoing, theoretical research of quantum computing applications is facilitated on classical computing platforms through simulation and emulation methods. Nevertheless, the exponential increase in resource requirement with the increase in the number of qubits is an inherent issue in classical modeling of quantum systems. In the effort to alleviate the critical scalability issue in existing FPGA emulation works, a novel FPGA-based quantum circuit emulation framework based on Heisenberg representation is proposed in this paper. Unlike previous works that are restricted to the emulations of quantum circuits of small qubit sizes, the proposed FPGA emulation framework can scale-up to 120-qubit on Altera Stratix IV FPGA for the stabilizer circuit case study while providing notable speed-up over the equivalent simulation model.


Sign in / Sign up

Export Citation Format

Share Document