Semi-automatic extraction of cross-table data from a set of spreadsheets

None, None; None, None

Semi-automatic extraction of cross-table data from a set of spreadsheets

Conference Paper (2017)

Author(s)

Alaaeddin Swidan (TU Delft - Software Engineering)

Felienne Hermans (TU Delft - Software Engineering)

Research Group

Software Engineering

Copyright

DOI related publication

https://doi.org/10.1007/978-3-319-58735-6_6

To reference this document use:

https://resolver.tudelft.nl/uuid:2d1fa1c7-de92-48fc-ad29-98c0f847159b

More Info

expand_more

Publication Year

2017

Language

English

Copyright

Research Group

Software Engineering

Volume number

10303 LNCS

Pages (from-to)

84-99

ISBN (print)

9783319587349

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Spreadsheets are widely used in companies. End-users often value the high degree of flexibility and freedom spreadsheets provide. However, these features lead to the development of a variety of data forms inside spreadsheets. A cross-table is one of these forms of data. A cross-table is defined as a rectangular form of data, which expresses the relations between a set of objects and a set of attributes. Cross-tables are common in spreadsheets: our exploratory analysis found that more than 3.42% of spreadsheets in an industrial open dataset include at least one cross-table. However, current software tools provide no support to analyze data in cross-tables. To address this, we presents a semi-automatic approach to extract cross-table data from a set of spreadsheets, and transform them to a relational table form. We evaluate our approach in a case study, on a set of 333 spreadsheets with 2,801 worksheets. The results show that the approach is successful in extracting over 92% of the data inside the targeted cross-tables. Further, we interview two users of the spreadsheets working in the company; they confirmed the approach is beneficial and provides correct results.

Files

Semi_automatic_Extraction_of_C... (pdf)

(pdf | 0.575 Mb)

License info not available