Print Email Facebook Twitter Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants Title Evaluating Automatic Spreadsheet Metadata Extraction on a Large Set of Responses from MOOC Participants Author Roy, S. (TU Delft Software Engineering) Hermans, F.F.J. (TU Delft Software Engineering) Aivaloglou, E. (TU Delft Software Engineering) Winter, J. van Deursen, A. (TU Delft Software Technology) Contributor Jiu, A. (editor) Department Software Technology Date 2016 Abstract Spreadsheets are popular end-user computing applications and one reason behind their popularity is that they offer a large degree of freedom to their users regarding the way they can structure their data. However, this flexibility also makes spreadsheets difficult to understand. Textual documentation can address this issue, yet for supporting automatic generation of textual documentation, an important pre-requisite is to extract metadata inside spreadsheets. It is a challenge though, to distinguish between data and metadata due to the lack of universally accepted structural patterns in spreadsheets. Two existing approaches for automatic extraction of spreadsheet metadata were not evaluated on large datasets consisting of user inputs. Hence in this paper, we describe the collection of a large number of user responses regarding identification of spreadsheet metadata from participants of a MOOC. We describe the use of this large dataset to understand how users identify metadata in spreadsheets, and to evaluate two existing approaches of automatic metadata extraction from spreadsheets. The results provide us with directions to follow in order to improve metadata extraction approaches, obtained from insights about user perception of metadata. We also understand what type of spreadsheet patterns the existing approaches perform well and on what type poorly, and thus which problem areas to focus on in order to improve. Subject computer aided instructionmeta datapersonal computingspreadsheet programstext analysisMOOC participantsautomatic spreadsheet metadata extractionautomatic textual documentation generationdata structureend-user computing applicationsmetadata user perceptionspreadsheet patternsComputersConferencesData miningDocumentationMetadataReliabilitySoftwareEmpirical evaluationMOOCMeta-data extractionSpreadsheetUser-study To reference this document use: http://resolver.tudelft.nl/uuid:34045a34-02db-4441-8846-b9f094680d7d DOI https://doi.org/10.1109/SANER.2016.98 Publisher IEEE Society, Los Alamitos, CA ISBN 978-1-5090-1855-0 Source 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2 Event SANER 2016, 2016-03-14 → 2016-03-18, Osaka, Japan Part of collection Institutional Repository Document type conference paper Rights © 2016 S. Roy, F.F.J. Hermans, E. Aivaloglou, J. Winter, A. van Deursen Files PDF TUD_SERG_2016_002.pdf 765.63 KB Close viewer /islandora/object/uuid:34045a34-02db-4441-8846-b9f094680d7d/datastream/OBJ/view