Detecting Problematic Lookup Functions in Spreadsheets

More Info
expand_more

Abstract

Spreadsheets are used heavily in many business domains around the world. They are easy to use and as such enable end-user programmers to and build and maintain all sorts of reports and analyses. In addition to using spreadsheets for modeling and calculation, spreadsheets are often also used for creating reports and dashboards: combining data from different sources and creating overviews. For this, lookup functions can be used: they search for a value in a range and return a corresponding row or column. Lookup functions are common: according to recent research the VLOOKUP is the fifth most common Excel function. In this paper we investigate the use of lookup functions in more detail. We analyze lookup functions within the newly released Enron spreadsheet corpus. The results show that 1) a minority of 43% of lookup formulas use the default setting where an approximate match may be returned, 2) 77% of approximate matches are used unnecessary and 3) 23% of approximate lookups is problematic: they search over unsorted ranges, while this is specifically advised against in the specification, and might lead to wrong results.