Copy-Paste Detection in Spreadsheets

Master Thesis (2013)
Author(s)

B.M.W. Sedee

Contributor(s)

F. Hermans – Mentor

M. Pinzger – Mentor

A. Van Deursen – Mentor

Copyright
© 2013 Sedee, B.M.W.
More Info
expand_more
Publication Year
2013
Copyright
© 2013 Sedee, B.M.W.
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

When a company is in need of a reporting tool, the most commonly made decision is to choose for Excel. In fact, over 90% of the world’s companies base their decisions on a report made using Excel. This shows that the number of spreadsheet designers, of end-user programmers, is large. It has been estimated to be 5 times as large as the number of software programmers in the traditional sense. This is one of the reasons spreadsheets are error-prone, possibly leading to erroneous decisions. One of the causes of problems within spreadsheets is the prevalence of copy-pasting. In this thesis we have studied this problem and we present an algorithm to detect data clones within spreadsheets: formulas whose values are copied in a different location. Aside from this algorithm, which we based on existing algorithms for code clone detection in software engineering, we present a classification scheme for the found data clones. We evaluated both the algorithm and the classification using the EUSES corpus, resulting in the conclusion that data clones in spreadsheet are as common as code clones in source code. We also show that we are able to detect these data clones with precision rates similar to those achieved by state-of-the-art code clone detection algorithm.

Files

Thesis.pdf
(pdf | 0.566 Mb)
License info not available