Optimal Representation of Large-Scale Graph Data Based on Grid Clustering and K2-Tree

Mathematical Problems in Engineering ◽

10.1155/2020/2354875 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Fengying Li ◽

Enyi Yang ◽

Anqiao Ma ◽

Rongsheng Dong

Keyword(s):

Adjacency Matrix ◽

Large Scale ◽

Compact Representation ◽

Graph Data ◽

Storage Overhead ◽

Time Space ◽

Query Algorithm ◽

Representation Scheme ◽

The Given ◽

Density Threshold

The application of appropriate graph data compression technology to store and manipulate graph data with tens of thousands of nodes and edges is a prerequisite for analyzing large-scale graph data. The traditional K2-tree representation scheme mechanically partitions the adjacency matrix, which causes the dense interval to be split, resulting in additional storage overhead. As the size of the graph data increases, the query time of K2-tree continues to increase. In view of the above problems, we propose a compact representation scheme for graph data based on grid clustering and K2-tree. Firstly, we divide the adjacency matrix into several grids of the same size. Then, we continuously filter and merge these grids until grid density satisfies the given density threshold. Finally, for each large grid that meets the density, K2-tree compact representation is performed. On this basis, we further give the relevant node neighbor query algorithm. The experimental results show that compared with the current best K2-BDC algorithm, our scheme can achieve better time/space tradeoff.

Download Full-text

Pattern Synthesis for Large-Scale Pattern Recognition

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch170 ◽

2011 ◽

pp. 902-905

Author(s):

P. Viswanath ◽

M. Narasimha Murty ◽

Shalabh Bhatnagar

Keyword(s):

Pattern Recognition ◽

Large Scale ◽

Nearest Neighbor ◽

Curse Of Dimensionality ◽

Compact Representation ◽

Pattern Synthesis ◽

Approximate Methods ◽

Compact Representations ◽

The Given ◽

Neighbor Classifier

Two major problems in applying any pattern recognition technique for large and high-dimensional data are (a) high computational requirements and (b) curse of dimensionality (Duda, Hart, & Stork, 2000). Algorithmic improvements and approximate methods can solve the first problem, whereas feature selection (Guyon & Elisseeff, 2003), feature extraction (Terabe, Washio, Motoda, Katai, & Sawaragi, 2002), and bootstrapping techniques (Efron, 1979; Hamamoto, Uchimura, & Tomita, 1997) can tackle the second problem. We propose a novel and unified solution for these problems by deriving a compact and generalized abstraction of the data. By this term, we mean a compact representation of the given patterns from which one can retrieve not only the original patterns but also some artificial patterns. The compactness of the abstraction reduces the computational requirements, and its generalization reduces the curse of dimensionality effect. Pattern synthesis techniques accompanied with compact representations attempt to derive compact and generalized abstractions of the data. These techniques are applied with nearest neighbor classifier (NNC), which is a popular nonparametric classifier used in many fields, including data mining, since its conception in the early 1950s (Dasarathy, 2002).

Download Full-text

Marbor: A novel large-scale graph data storage and processing framework

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2014.7017031 ◽

2014 ◽

Author(s):

Wei Zhou ◽

Yun Gao ◽

Jizhong Han ◽

Zhiyong Xu

Keyword(s):

Data Storage ◽

Large Scale ◽

Graph Data ◽

Processing Framework

Download Full-text

Neural mechanisms of context-dependent segmentation tested on large-scale recording data

10.1101/2021.04.25.441363 ◽

2021 ◽

Author(s):

Toshitake Asabuki ◽

Tomoki Fukai

Keyword(s):

Cortical Neurons ◽

Large Scale ◽

Neural Mechanism ◽

Imaging Data ◽

Unsupervised Segmentation ◽

Spike Sequences ◽

Multiple Cell ◽

Context Dependent ◽

Current Flows ◽

The Given

The brain performs various cognitive functions by learning the spatiotemporal salient features of the environment. This learning likely requires unsupervised segmentation of hierarchically organized spike sequences, but the underlying neural mechanism is only poorly understood. Here, we show that a recurrent gated network of neurons with dendrites can context-dependently solve difficult segmentation tasks. Dendrites in this model learn to predict somatic responses in a self-supervising manner while recurrent connections learn a context-dependent gating of dendro-somatic current flows to minimize a prediction error. These connections select particular information suitable for the given context from input features redundantly learned by the dendrites. The model selectively learned salient segments in complex synthetic sequences. Furthermore, the model was also effective for detecting multiple cell assemblies repeating in large-scale calcium imaging data of more than 6,500 cortical neurons. Our results suggest that recurrent gating and dendrites are crucial for cortical learning of context-dependent segmentation tasks.

Download Full-text

Algorithm for Counting Cars in Large-scale Video Surveillance Systems

Proceedings of the 30th International Conference on Computer Graphics and Machine Vision (GraphiCon 2020). Part 1 ◽

10.51130/graphicon-2020-1-100-108 ◽

2020 ◽

pp. 100-108

Author(s):

Arsenii Shirokov ◽

Denis Kuplyakov ◽

Anton Konushin

Keyword(s):

Video Surveillance ◽

Large Scale ◽

Surveillance Systems ◽

Motion Model ◽

People Tracking ◽

Detection Frequency ◽

Computational Resources ◽

Sparse Set ◽

The Given ◽

Distributed Tracking

The article deals with the problem of counting cars in large-scale video surveillance systems. The proposed method is based on car tracking and counting the number of tracks intersecting the given signal line. We use a distributed tracking algorithm. It reduces the amount of necessary computational resources and increases performance up to realtime by detecting vehicles in a sparse set of frames. We adapted and modified the approach previously proposed for people tracking. Proposed improvement of the speed estimation module and refinement of the motion model reduced the detection frequency by 3 times. The experimental evaluation shows that the proposed algorithm allows reaching an acceptable counting quality with a detection frequency of 3 Hz.

Download Full-text

Evaluating the parameters of a mobile maize dryer in practice

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun201361061779 ◽

2013 ◽

Vol 61 (6) ◽

pp. 1779-1784

Author(s):

Josef Los ◽

Jiří Fryč ◽

Zdeněk Konrád

Keyword(s):

Large Scale ◽

Electric Energy ◽

Operating Parameters ◽

The Czech Republic ◽

Maize Hybrids ◽

Post Harvest ◽

Second Stage ◽

Energy Intensiveness ◽

Two Stages ◽

The Given

The method of drying maize for grain has been recently employed on a large scale in the Czech Republic not only thanks to new maize hybrids but also thanks to the existence of new models of drying plants. One of the new post-harvest lines is a plant in Lipoltice (mobile dryer installed in 2010, storage base in 2012) where basic operational measurements were made of the energy intensiveness of drying and operating parameters of the maize dryer were evaluated. The process of maize drying had two stages, i.e. pre-drying from the initial average grain humidity of 28.55% to 19.6% in the first stage, and the additional drying from 16.7% to a final storage grain humidity of 13.7%. Mean volumes of natural gas consumed per 1 t% for drying in the first and second stage amounted to 1.275 m3 and 1.56 m3, respectively. The total mean consumption of electric energy per 1 t% was calculated to be 1.372 kWh for the given configuration of the post-harvest line.

Download Full-text

Improving COVID-19 Testing Efficiency using Guided Agglomerative Sampling

10.1101/2020.04.13.039792 ◽

2020 ◽

Author(s):

Fayyaz Minhas ◽

Dimitris Grammatopoulos ◽

Lawrence Young ◽

Imran Amin ◽

David Snead ◽

...

Keyword(s):

Large Scale ◽

Resource Constraints ◽

Large Population ◽

Sampling Strategy ◽

Population Surveillance ◽

Large Scale Testing ◽

Scale Population ◽

Simulation Results ◽

The Given ◽

Test Outcomes

AbstractOne of the challenges in the current COVID-19 crisis is the time and cost of performing tests especially for large-scale population surveillance. Since, the probability of testing positive in large population studies is expected to be small (<15%), therefore, most of the test outcomes will be negative. Here, we propose the use of agglomerative sampling which can prune out multiple negative cases in a single test by intelligently combining samples from different individuals. The proposed scheme builds on the assumption that samples from the population may not be independent of each other. Our simulation results show that the proposed sampling strategy can significantly increase testing capacity under resource constraints: on average, a saving of ~40% tests can be expected assuming a positive test probability of 10% across the given samples. The proposed scheme can also be used in conjunction with heuristic or Machine Learning guided clustering for improving the efficiency of large-scale testing further. The code for generating the simulation results for this work is available here: https://github.com/foxtrotmike/AS.

Download Full-text

Exemplar Guided Neural Dialogue Generation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/498 ◽

2020 ◽

Author(s):

Hengyi Cai ◽

Hongshen Chen ◽

Yonghao Song ◽

Xiaofang Zhao ◽

Dawei Yin

Keyword(s):

Large Scale ◽

State Of The Art ◽

Training Data ◽

Small Subset ◽

Generation Model ◽

Retrieval Model ◽

Training Set ◽

Dialogue Model ◽

Quantitative Metrics ◽

The Given

Humans benefit from previous experiences when taking actions. Similarly, related examples from the training data also provide exemplary information for neural dialogue models when responding to a given input message. However, effectively fusing such exemplary information into dialogue generation is non-trivial: useful exemplars are required to be not only literally-similar, but also topic-related with the given context. Noisy exemplars impair the neural dialogue models understanding the conversation topics and even corrupt the response generation. To address the issues, we propose an exemplar guided neural dialogue generation model where exemplar responses are retrieved in terms of both the text similarity and the topic proximity through a two-stage exemplar retrieval model. In the first stage, a small subset of conversations is retrieved from a training set given a dialogue context. These candidate exemplars are then finely ranked regarding the topical proximity to choose the best-matched exemplar response. To further induce the neural dialogue generation model consulting the exemplar response and the conversation topics more faithfully, we introduce a multi-source sampling mechanism to provide the dialogue model with both local exemplary semantics and global topical guidance during decoding. Empirical evaluations on a large-scale conversation dataset show that the proposed approach significantly outperforms the state-of-the-art in terms of both the quantitative metrics and human evaluations.

Download Full-text

IDENTITY IMPLICATIONS: «FROM THE COUNTRY OF RICE AND OPIUM» BY SOFIIA YABLONSKA

Ukraine: Cultural Heritage, National Identity, Statehood ◽

10.33402/ukr.2018-31-251-266 ◽

2018 ◽

Vol 31 ◽

pp. 251-266

Author(s):

Tymofii HAVRYLIV

Keyword(s):

Travel Literature ◽

Spatial Dimension ◽

Interdisciplinary Studies ◽

Complex Nature ◽

Time Space ◽

The World ◽

Self Knowledge ◽

The Given ◽

First Time ◽

Ukrainian Society

This article is one of the first scholarly attempts to analyze the creative work of Ukrainian filmmaker and traveler Sofiia Yablonska-Uden. For the first time in the Ukrainian and the world literary studies, identical implications are analyzed in the «From the Country of Rice and Opium» by S. Yablonska. The purpose of the article is to highlight the complex nature of identity issues in travel literature. In terms of identity, the journey performs two fundamental, closely interconnected tasks: knowledge of the other and self-knowledge. Hermeneutic approaches are used in the article. The main results can be summarized as follows: 1) the journey has its own time-spatial dimension, consisting of two disproportionate moments: preparation for travel and travel itself, and begins literally and symbolically with the overcoming, or the crossing of the border; 2) the intention of the trip contains an identity challenge that affects the preparation, organization, realization of the travel, the way and the content of documenting impressions; 3) such parameters of travel as an accident, an adventure, a game which formed the world of traveler's impressions, are subordinated to the identity problem in the given work; 4) the essay character of the book makes it possible to talk about implications as a response to an identity challenge. The book of travel essays «From the Country of Rice and Opium» of S. Yablonska-Uden is a sample of a successful combination of the business and private aspects of travel, intentions of knowledge and self-knowledge, poetry and faculty; learning about another people and countries, the writer learns a lot of things about himself. Travel literature is an important study object of Ukrainian writing, which opens the prospects for further interdisciplinary studies. The study of travel literature, an identity issue, is extremely relevant both for the development of Ukrainian society and for the formation of optimal responses to the challenges of our time. Keywords travel, travel literature, identity, identical implications, time-space disposition.

Download Full-text

Efficient Indexing RDF Query Algorithm for Big Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.441.691 ◽

2013 ◽

Vol 441 ◽

pp. 691-694

Author(s):

Yi Qun Zeng ◽

Jing Bin Wang

Keyword(s):

Large Scale ◽

Rapid Development ◽

Large Data ◽

Index Structure ◽

Data Query ◽

Large Scale Data ◽

Tree Index ◽

Rdf Data ◽

Query Algorithm ◽

Scale Data

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.

Download Full-text

Time–Space Characteristics of Diurnal Rainfall over Borneo and Surrounding Oceans as Observed by TRMM-PR

Journal of Climate ◽

10.1175/jcli3714.1 ◽

2006 ◽

Vol 19 (7) ◽

pp. 1238-1260 ◽

Cited By ~ 106

Author(s):

Hiroki Ichikawa ◽

Tetsuzo Yasunari

Keyword(s):

Diurnal Cycle ◽

Large Scale ◽

Intraseasonal Variability ◽

Phase Speed ◽

Tropical Rainfall Measuring Mission ◽

Heavy Precipitation ◽

Circulation Pattern ◽

Leeward Side ◽

Convective Rainfall ◽

Time Space

Abstract Five years of Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR) data were used to investigate the time and space characteristics of the diurnal cycle of rainfall over and around Borneo, an island in the Maritime Continent. The diurnal cycle shows a systematic modulation that is associated with intraseasonal variability in the large-scale circulation pattern, with regimes associated with low-level easterlies or westerlies over the island. The lower-tropospheric westerly (easterly) components correspond to periods of active (inactive) convection over the island that are associated with the passage of intraseasonal atmospheric disturbances related to the Madden–Julian oscillation. A striking feature is that rainfall activity propagates to the leeward side of the island between midnight and morning. The inferred phase speed of the propagation is about 3 m s−1 in the easterly regime and 7 m s−1 in the westerly regime. Propagation occurs over the entire island, causing a leeward enhancement of rainfall. The vertical structure of the developed convection/rainfall system differs remarkably between the two regimes. In the easterly regime, stratiform rains are widespread over the island at midnight, whereas in the westerly regime, local convective rainfall dominates. Over offshore regions, convective rainfall initially dominates then gradually decreases in both regimes, while the storms develop into deeper convective systems in the easterly regime. Aside from leeward rainfall propagation, shallow storms develop over the South China Sea region during the westerly regime, resulting in heavy precipitation from midnight through morning.

Download Full-text