Ranking and semi-supervised classification on large scale graphs using map-reduce

Association rule is one of the primary tasks in data mining that discovers correlations among items in a transactional database. The majority of vertical and horizontal association rule mining algorithms have been developed to improve the frequent items discovery step which necessitates high demands on training time and memory usage particularly when the input database is very large. In this paper, we overcome the problem of mining very large data by proposing a new parallel Map-Reduce (MR) association rule mining technique called MR-ARM that uses a hybrid data transformation format to quickly finding frequent items and generating rules. The MR programming paradigm is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use, and therefore the proposed algorithm develops a fast parallel distributed batch set intersection method for finding frequent items. Two implementations (Weka, Hadoop) of the proposed MR association rule algorithm have been developed and a number of experiments against small, medium and large data collections have been conducted. The ground bases of the comparisons are time required by the algorithm for: data initialisation, frequent items discovery, rule generation, etc. The results show that MR-ARM is very useful tool for mining association rules from large datasets in a distributed environment.

Download Full-text

Map reduce for optimizing a large-scale dynamic network — the Internet of hearts

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) ◽

10.1109/embc.2016.7591351 ◽

2016 ◽

Author(s):

Chen Kan ◽

Fabio M. Leonelli ◽

Hui Yang

Keyword(s):

Large Scale ◽

Dynamic Network ◽

Map Reduce ◽

The Internet

Download Full-text

Local learning integrating global structure for large scale semi-supervised classification

2012 8th International Conference on Natural Computation ◽

10.1109/icnc.2012.6234597 ◽

2012 ◽

Author(s):

Guangchao Wu ◽

Yuhan Li ◽

Jianqing Xi ◽

Xiaowei Yang ◽

Xiaolan Liu

Keyword(s):

Supervised Classification ◽

Large Scale ◽

Global Structure ◽

Local Learning

Download Full-text

Nakamoto Consensus to Accelerate Supervised Classification Algorithms for Multiparty Computing

Security and Communication Networks ◽

10.1155/2021/6629433 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Zhen Zhang ◽

Bing Guo ◽

Yan Shen ◽

Chengjie Li ◽

Xinhua Suo ◽

...

Keyword(s):

Distributed Computing ◽

Supervised Classification ◽

Large Scale ◽

Heterogeneous Data ◽

Classification Algorithms ◽

Distributed Data ◽

Data Intensive ◽

Private Data ◽

Cooperation Mechanism ◽

Mathematical Formulas

Bitcoin mining consumes tremendous amounts of electricity to solve the hash problem. At the same time, large-scale applications of artificial intelligence (AI) require efficient and secure computing. There are many computing devices in use, and the hardware resources are highly heterogeneous. This means a cooperation mechanism is needed to realize cooperation among computing devices, and a good calculation structure is required in the case of data dispersion. In this paper, we propose an architecture where devices (also called nodes) can reach a consensus on task results using off-chain smart contracts and private data. The proposed distributed computing architecture can accelerate computing-intensive and data-intensive supervised classification algorithms with limited resources. This architecture can significantly increase privacy protection and prevent leakage of distributed data. Our proposed architecture can support heterogeneous data, making computing on each device more efficient. We used mathematical formulas to prove the correctness and robustness of our system and deduced the condition to stop a given task. In the experiments, we transformed Bitcoin hash collision into distributed computing on several nodes and evaluated the training and prediction accuracy for handwritten digit images (MNIST). The experimental results demonstrate the effectiveness of the proposed method.

Download Full-text

Map-Reduce based Link Prediction for Large Scale Social Network

Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering ◽

10.18293/seke2017-100 ◽

2017 ◽

Cited By ~ 6

Author(s):

Ranjan Kumar Behera ◽

Abhishek Sai Sukla ◽

Sambit Mahapatra ◽

Santanu Kumar Rath ◽

Bibhudatta Sahoo ◽

...

Keyword(s):

Social Network ◽

Link Prediction ◽

Large Scale ◽

Map Reduce

Download Full-text

Accelerated low-rank representation for subspace clustering and semi-supervised classification on large-scale data

Neural Networks ◽

10.1016/j.neunet.2018.01.014 ◽

2018 ◽

Vol 100 ◽

pp. 39-48 ◽

Cited By ~ 7

Author(s):

Jicong Fan ◽

Zhaoyang Tian ◽

Mingbo Zhao ◽

Tommy W.S. Chow

Keyword(s):

Supervised Classification ◽

Large Scale ◽

Subspace Clustering ◽

Low Rank ◽

Large Scale Data ◽

Low Rank Representation ◽

Scale Data

Download Full-text

SYNERGISTIC USE OF SENTINEL-1 AND SENTINEL-2 TIME SERIES FOR POPLAR PLANTATIONS MONITORING AT LARGE SCALE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-1457-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 1457-1461

Author(s):

Y. Hamrouni ◽

É. Paillassa ◽

V. Chéret ◽

C. Monteil ◽

D. Sheeren

Keyword(s):

Time Series ◽

Active Learning ◽

Supervised Classification ◽

Large Scale ◽

Training Data ◽

Passive Learning ◽

Training Samples ◽

Poplar Plantations ◽

Annual Means ◽

Sentinel 2

Abstract. The current context of availability of Earth Observation satellite data at high spatial and temporal resolutions makes it possible to map large areas. Although supervised classification is the most widely adopted approach, its performance is highly dependent on the availability and the quality of training data. However, gathering samples from field surveys or through photo interpretation is often expensive and time-consuming especially when the area to be classified is large. In this paper we propose the use of an active learning-based technique to address this issue by reducing the labelling effort required for supervised classification while increasing the generalisation capabilities of the classifier across space. Experiments were conducted to identify poplar plantations in three different sites in France using Sentinel-2 time series. In order to characterise the age of the identified poplar stands, temporal means of Sentinel-1 backscatter coefficients were computed. The results are promising and show the good capacities of the active learning-based approach to achieve similar performance (Poplar F-score &geq; 90%) to traditional passive learning (i.e. with random selection of samples) with up to 50% fewer training samples. Sentinel-1 annual means have demonstrated their potential to differentiate two stand ages with an overall accuracy of 83% regardless of the cultivar considered.

Download Full-text