Print Email Facebook Twitter Which error detection tool to choose? Title Which error detection tool to choose? Author Vermeulen, Martijn (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Katsifodimos, A (mentor) Koutras, C. (graduation committee) Houben, G.J.P.M. (graduation committee) Gousios, G. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Software Technology Date 2020-08-16 Abstract The amount of data being collected is growing exponentially, both in academics as well as in business. Unfortunately, the quality of that data can be poor, leading to poor decisions and increasing costs. Data cleaning, the process of detecting and correcting errors from a dataset, could be the solution to improve bad data.This research focuses on detecting these errors. There are (semi-)automated error detection tools available, but it is unclear how well these tools perform under varying conditions and on different datasets.Following from this problem, the main research question was developed: How tochoose a fitting error detection algorithm for a specific relational dataset?To answer this question, a comparative study has been done for error detection tools on relational data. An interactive error detection tool, Raha, performed best from the selected state of the art tools.Subsequently, an attempt was made to estimate the performance of error detection tools and particular configurations on unseen datasets, based on high-level profiles of these datasets. According to the qualitative and quantitative experiments in this research, the proposed estimators have been shown to be effective. Moreover, the performance estimators were analyzed to provide more interpretability on the functioning of the error detection tools on the datasets in this research.Ultimately, these performance estimators were used to generate suggested rankings of error detection strategies. The produced system outperformed the set baseline and was able to create valuable rankings. The proposed strategy ranking system could help real-world computer scientists and data experts choose a fitting error detection algorithm for a specific relational dataset. Subject error detectionrelational datacomparison studyperformance prediction To reference this document use: http://resolver.tudelft.nl/uuid:7ec4362a-5c93-4b53-8bc0-ddc01958587a Part of collection Student theses Document type master thesis Rights © 2020 Martijn Vermeulen Files PDF Master_Thesis.pdf 1.63 MB Close viewer /islandora/object/uuid:7ec4362a-5c93-4b53-8bc0-ddc01958587a/datastream/OBJ/view