A Comparative Study of Process Mining Tools

FlexFringe, ProM, MINT and PRINS

More Info
expand_more

Abstract

Nowadays, software is an integral part of many companies. However, the codebase can grow large and complicated and is often insufficiently documented. To gain insight, tools have been made to infer state machines and process models from software logs. These tools produce different types of models such as automata and Petri nets. The main objective of this research is to determine which tool is the optimal choice for inferring a readable and correct model within reasonable time. Currently, Petri nets and automata are not compared to each other and not all key performance indicators are applicable to both model types. To compare these different concepts, suitable metrics must be identified.

For this work, 8 configurations of 4 programs will be compared. Finite State Machines (FSMs) will be inferred with FlexFringe (AIC), MINT and PRINS (using MINT internally). Petri nets will be mined with ProM using the Inductive Miner, Inductive Miner Infrequent - All Operators, Hybrid-ILP and the Directly Follows miner. The configurations will use 5-folds cross validation to infer models on 9 software logs. Negative traces were synthesised as they were not available. The quality of the models will be measured though inference time, complexity, F2-score, balanced accuracy, fitness and perplexity.

Some of the used metrics were adequate, but others were not suitable. Inference time, F2-score, balanced accuracy could be measured for both FSMs and Petri nets. The complexity was measured with the Petri net $eCFC$ metric and the Cyclomatic complexity CC. The eCFC does not properly express complexity on FSMs. Furthermore, Petri nets can model parallelism, which introduces extra complexity compared to an FSM. This was not adequately expressed by either of these metrics. To measure fitness, both token-based replay and alignment fitness were used. FSMs were converted to Petri nets. Token-based replay (TBR) fitness was not an expressive metric for the FSMs, as the concept of tokens did not carry over well. In addition to this, the external implementation of TBR fitness was flawed for the specific structure of the converted FSMs. Alignment-based fitness is the superior fitness metric as it does not rely on the notion of tokens, which the FSMs do not have. Unfortunately, the time and memory needed for alignment computations was too large for some models. Lastly, the perplexity FSM metric was successfully adapted for Petri nets. It expresses the difference in structure and could be tailored even further for the purposes of comparison by adjusting its parameters.

The results of the comparison showed that almost all configurations could complete inference in feasible time and memory. The time out was set at 4 hours and the available memory was \texttt{16GB}. The MINT and PRINS ran out of memory on one of the larger logs and timed out for one other set. Hybrid-ILP timed out for 2 sets. All other configurations completed inference for all data sets in under 40 seconds. PRINS and MINT boasted excellent performance across all correctness metrics, and were only outperformed by FlexFringe and the Directly Follows miner on perplexity. PRINS and MINT were most suitable for modelling traces of data sets with a low trace similarity and generalising to identify new traces. However, MINT models were a lot larger than those of any other configurations and PRINS models are generally many times larger than MINT models. So, if complexity of models is a big concern, FlexFringe and the Directly Follows miner offer the smaller models, at the cost of a small amount of performance for most sets. However, these two tools perform poorly on sets that are both incomplete and contain dissimilar traces. If time is of the essence, one should use FlexFringe, the Directly Follows miner of one of the Inductive Miners. The Petri net miners used, were not designed to introduce new behaviour in a model. Therefore, FlexFringe is preferable for modelling an incomplete log.