Parallel Distributed Patterns Mining Using Hadoop MapReduce Framework
The treatment of large data is proving more difficult in different axes, but the arrival of the framework MapReduce is a solution of this problem. With it we can analyze and process vast amounts of data. It does this by distributing the computational work across a cluster of virtual servers running in a cloud or large set of machines while process mining provides an important bridge between data mining and business process analysis. The process mining techniques allow for extracting information from event logs. In general, there are two steps in process mining: correlation definition or discovery and process inference or composition. Firstly, the authors' work consists to mine small patterns from a log traces. Those patterns are the representation of the traces execution from a log file of a business process. In this step, they use existing techniques. The patterns are represented by finite state automaton or their regular expression. The final model is the combination of only two types of small patterns whom are represented by the regular expressions (ab)* and (ab*c)*. Secondly, the authors compute these patterns in parallel, and then combine those small patterns using the MapReduce framework. They have two parties: the first is the Map Step in which they mine patterns from execution traces; the second is the combination of these small patterns as reduce step. The authors' results are promising in that they show that their approach is scalable, general, and precise. It minimizes the execution time by the use of the MapReduce framework.