Parsing Excel formulas

A grammar and its application on 4 large datasets

Contribution To Periodical (2017)
Author(s)

Fenia Aivaloglou (TU Delft - Software Engineering)

David Hoepelman (Student TU Delft)

F. Hermans (TU Delft - Software Engineering)

Research Group
Software Engineering
DOI related publication
https://doi.org/10.1002/smr.1895
More Info
expand_more
Publication Year
2017
Language
English
Research Group
Software Engineering
Issue number
12
Volume number
29
Pages (from-to)
1-19

Abstract

Spreadsheets are popular end user programming tools, especially in the industrial world. This makes them interesting research targets. However, there does not exist a reliable grammar that is concise enough to facilitate formula parsing and analysis and to support research on spreadsheet codebases. This paper presents a grammar for spreadsheet formulas that can successfully parse 99.99% of more than 8 million unique formulas extracted from 4 spreadsheet datasets. Our grammar is compatible with the spreadsheet formula language, recognizes the spreadsheet formula elements that are required for supporting spreadsheets research, and produces parse trees aimed at further manipulation and analysis. Additionally, we use the grammar to analyze the characteristics of the formulas of the 4 datasets in 3 different dimensions: complexity, functionality, and data utilization. Our results show that (1) most Excel formulas are simple, however formulas with more than 50 functions or operations exist, (2) almost all formulas use data from other cells, which is often not local, and (3) a surprising number of referring mechanisms are used by less than 1% of the formulas.

No files available

Metadata only record. There are no files for this record.