Semi-automatic extraction of cross-table data from a set of spreadsheets

Conference Paper (2017)
Author(s)

Alaaeddin Swidan (TU Delft - Software Engineering)

Felienne Hermans (TU Delft - Software Engineering)

Research Group
Software Engineering
Copyright
© 2017 A.A.S. Swidan, F.F.J. Hermans
DOI related publication
https://doi.org/10.1007/978-3-319-58735-6_6
More Info
expand_more
Publication Year
2017
Language
English
Copyright
© 2017 A.A.S. Swidan, F.F.J. Hermans
Research Group
Software Engineering
Volume number
10303 LNCS
Pages (from-to)
84-99
ISBN (print)
9783319587349
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Spreadsheets are widely used in companies. End-users often value the high degree of flexibility and freedom spreadsheets provide. However, these features lead to the development of a variety of data forms inside spreadsheets. A cross-table is one of these forms of data. A cross-table is defined as a rectangular form of data, which expresses the relations between a set of objects and a set of attributes. Cross-tables are common in spreadsheets: our exploratory analysis found that more than 3.42% of spreadsheets in an industrial open dataset include at least one cross-table. However, current software tools provide no support to analyze data in cross-tables. To address this, we presents a semi-automatic approach to extract cross-table data from a set of spreadsheets, and transform them to a relational table form. We evaluate our approach in a case study, on a set of 333 spreadsheets with 2,801 worksheets. The results show that the approach is successful in extracting over 92% of the data inside the targeted cross-tables. Further, we interview two users of the spreadsheets working in the company; they confirmed the approach is beneficial and provides correct results.

Files

License info not available