Characterization and Mitigation of High-Confidence Errors Through the Use of Human-In-The-Loop Methods

None, None

Characterization and Mitigation of High-Confidence Errors Through the Use of Human-In-The-Loop Methods

Domain Expert Driven Approach to Model Development

Master Thesis (2022)

Author(s)

P. Hoogland (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J Yang – Mentor (TU Delft - Web Information Systems)

O. Inel – Mentor (TU Delft - Web Information Systems)

Margje Schuur – Mentor (Inspectie Leefomgeving en Transport)

Jasper van Vliet – Mentor (Inspectie Leefomgeving en Transport)

G.J. Houben – Coach (TU Delft - Web Information Systems)

Y. Chen – Coach (TU Delft - Data-Intensive Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine Learning Interpretability Human-Centered Computing High Confidence Errors End-User Trust

To reference this document use:

https://resolver.tudelft.nl/uuid:8a15ebd2-5668-48f1-9cb5-d681b6b37fad

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

08-02-2022

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Data Science and Technology']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In the use of Machine Learning systems, attaining the trust of those that are the end-users can often be difficult. Many of the current state-of-the-art systems operate as Black-Boxes. Errors produced by these Black-Box systems, without further explanation as to why these decisions were made, will deteriorate trust. This effect is especially strong when these erroneous decisions are generated with high confidence. This thesis presents both a data-driven as well as a human-in-the-Loop based methodology to characterize and mitigate high-confidence errors. We propose an Iterative Expert Session based methodology. By engaging domain experts through a series of interaction sessions, we aim to reduce the disconnect and knowledge gap between data scientists and domain experts, and to ultimately increase trust in the model. A practical approach was taken working in close connection with the practice of the data scientists of the ILT, helping them in improving their model and providing a direct contribution. We study the problem in the context of Road and Transportation law violations, by engaging inspectors (i.e., domain experts) in day-in-the-life and in-house interview sessions.
A thorough analysis is performed of the most important features for data instances that were in error with a high degree of confidence. A method is presented that helps in characterizing these errors by predicting errors.
We show that by careful removal of biased data features, proper data selection and by bridging the knowledge gap between domain experts and data scientists, we can improve the performance of the machine learning model. We show an increase of model Precision from 0.56800 with a baseline of 0.32968 to a Precision of 0.52077 with a baseline of 0.23473. Considering the baseline, this is an increase of 28.9% in Precision. We reduce biases existent in the data by reducing variables that predict on inspector practice. The magnitude of High Confidence Errors in the top 20% errors went from 0.70435 to 0.70465 showing an improvement taking into account the reduced baseline and removal of overfitted variables.

Files

MSc_Thesis_Pavel_Hoogland.pdf

(pdf | 8.71 Mb)

License info not available