Print Email Facebook Twitter Copy-Paste Detection in Spreadsheets Title Copy-Paste Detection in Spreadsheets Author Sedee, B.M.W. Contributor Hermans, F. (mentor) Pinzger, M. (mentor) Van Deursen, A. (mentor) Faculty Electrical Engineering, Mathematics and Computer Science Department Software Technology Programme Software Engineering Research Group Date 2013-02-13 Abstract When a company is in need of a reporting tool, the most commonly made decision is to choose for Excel. In fact, over 90% of the world’s companies base their decisions on a report made using Excel. This shows that the number of spreadsheet designers, of end-user programmers, is large. It has been estimated to be 5 times as large as the number of software programmers in the traditional sense. This is one of the reasons spreadsheets are error-prone, possibly leading to erroneous decisions. One of the causes of problems within spreadsheets is the prevalence of copy-pasting. In this thesis we have studied this problem and we present an algorithm to detect data clones within spreadsheets: formulas whose values are copied in a different location. Aside from this algorithm, which we based on existing algorithms for code clone detection in software engineering, we present a classification scheme for the found data clones. We evaluated both the algorithm and the classification using the EUSES corpus, resulting in the conclusion that data clones in spreadsheet are as common as code clones in source code. We also show that we are able to detect these data clones with precision rates similar to those achieved by state-of-the-art code clone detection algorithm. Subject SpreadsheetsClone detection To reference this document use: http://resolver.tudelft.nl/uuid:4a506eea-4c7b-4e3c-93be-748127d8f9a3 Part of collection Student theses Document type master thesis Rights (c) 2013 Sedee, B.M.W. Files PDF thesis.pdf 579.77 KB Close viewer /islandora/object/uuid:4a506eea-4c7b-4e3c-93be-748127d8f9a3/datastream/OBJ/view