Comparing Spreadsheets

A New Strategy for Analysis, Detection and Aggregation of Spreadsheet Differences

More Info
expand_more

Abstract

Comparing spreadsheet files is a new, unexplored research domain in computer science. Methods for regular file comparison are not straightforwardly applicable to spreadsheet files, because they are fundamentally different. Spreadsheets are binary files, the structure of spreadsheets is two-dimensional, they contain both data and calculations, and the content exists on different abstraction levels. Fundamental challenges in the spreadsheet comparison problem include: change propagation, performance, 2D alignment, the grouping of data, and movement detection. One simple modification can change the whole structure and model of the spreadsheet. The aim of a good file comparison method is to show the actual changes made by an end-user, not the propagated changes. In this thesis, a new pipeline-based approach for comparing two spreadsheet files is proposed. The spreadsheet comparison is solved in three phases: (1) structure analysis, (2) change detection, and (3) change aggregation. The essential element in this approach is passing on information from the structure analysis to the change detection. In addition, the detected changes are forwarded to the aggregation such that changes are converted into clear, understandable differences. The final comparison result, therefore, provides information at different levels of abstraction. New solutions like cell hashing, an optimized 2D alignment using longest common subsequences, and different algorithms for comparing worksheets, defined names, rows, columns and cells have resulted in a state-of-the-art spreadsheet comparison approach. In addition, this thesis presents a prototype demonstrating the proposed concepts in practice. This tool, called CompareXL, is a stand-alone application that can compare two spreadsheet files. The user interface is built with a multi-level interaction design, showing the comparison results intuitively to the end-user. Spreadsheets are commonly used and many problems arise from spreadsheet versioning issues. Spreadsheet users have user needs related to overview, validation, completeness, error resolving, visualization, and evolution. A spreadsheet comparison program is helpful to address these user needs. This research shows that it is beneficial to involve users in the exploration of solutions, because comparing spreadsheets is not only a technical problem but also a user problem. The outcome of this research offers multiple directions for future work, with the ultimate goal that all spreadsheet version problems will soon belong to the past.

Files

Comparing_Spreadsheets.pdf
(.pdf | 2.77 Mb)
- Embargo expired in 24-12-2020