Characterization and Mitigation of High-Confidence Errors Through the Use of Human-In-The-Loop Methods

Domain Expert Driven Approach to Model Development

More Info
expand_more

Abstract

In the use of Machine Learning systems, attaining the trust of those that are the end-users can often be difficult. Many of the current state-of-the-art systems operate as Black-Boxes. Errors produced by these Black-Box systems, without further explanation as to why these decisions were made, will deteriorate trust. This effect is especially strong when these erroneous decisions are generated with high confidence. This thesis presents both a data-driven as well as a human-in-the-Loop based methodology to characterize and mitigate high-confidence errors. We propose an Iterative Expert Session based methodology. By engaging domain experts through a series of interaction sessions, we aim to reduce the disconnect and knowledge gap between data scientists and domain experts, and to ultimately increase trust in the model. A practical approach was taken working in close connection with the practice of the data scientists of the ILT, helping them in improving their model and providing a direct contribution. We study the problem in the context of Road and Transportation law violations, by engaging inspectors (i.e., domain experts) in day-in-the-life and in-house interview sessions.
A thorough analysis is performed of the most important features for data instances that were in error with a high degree of confidence. A method is presented that helps in characterizing these errors by predicting errors.
We show that by careful removal of biased data features, proper data selection and by bridging the knowledge gap between domain experts and data scientists, we can improve the performance of the machine learning model. We show an increase of model Precision from 0.56800 with a baseline of 0.32968 to a Precision of 0.52077 with a baseline of 0.23473. Considering the baseline, this is an increase of 28.9% in Precision. We reduce biases existent in the data by reducing variables that predict on inspector practice. The magnitude of High Confidence Errors in the top 20% errors went from 0.70435 to 0.70465 showing an improvement taking into account the reduced baseline and removal of overfitted variables.