MH

M.P.A. Haakman

info

Please Note

2 records found

As organizations start to adopt machine learning in critical business scenarios, the development processes change and the reliability of the applications becomes more important. To investigate these changes and improve the reliability of those applications, we conducted two studies in this thesis. The first study aims to understand the evolution of the processes by which machine learning applications are developed and how state-of-the-art lifecycle models fit the current needs of the fintech industry. Therefore, we conducted a case study with seventeen machine learning practitioners at the fintech company ING. The results indicate that the existing lifecycle models CRISP-DM and TDSP largely reflect the current development processes of machine learning applications, but there are crucial steps missing, including a feasibility study, documentation, model evaluation, and model monitoring. Our second study aims to reduce bugs and improve the code quality of machine learning applications. We developed a static code analysis tool consisting of six checkers to find probable bugs and enforcing best practices, specifically in Python code used for processing large amounts of data and modeling in the machine learning lifecycle. The evaluation of the tool using 1000 collected notebooks from Kaggle shows that static code analysis can detect and thus help prevent probable bugs in data science code. Our work shows that the real challenges of applying machine learning go much beyond sophisticated learning algorithms -- more focus is needed on the entire lifecycle. ...
Bachelor thesis (2018) - Henk Grent, Mark Haakman, Frenk van Mil, Sander Waij, Mathijs de Weerdt, Otto Visser
Through the years, companies have been exploring the field of data science. The Nederlandse Spoorwegen (NS) is not an exception to this. Modern trains are equipped with sensors that measure a variety of conditions within the train. This data is being stored in their data warehouse. This data has been proven useful for detection and response times to problems, which warrants two high-level goals of the NS: punctuality and reliability. However, even with the available data, visualization and detection of location-specific problems are not yet implemented. Location-specific problems are problems that are not caused by the train, but by the infrastructure or human fault at that specific location. At the moment, most patterns in error codes are only backed up by suspicions, since these error codes are not stored in a way they are easily readable. Therefore, it is hard to find connections between multiple error codes. This document describes the created system that supports the analysis of location-specific error code patterns. With the system, the NS will be able to improve their two high-level goals and ultimately improve customer satisfaction.

For the system, a framework was made, which allows the NS to further develop and extend on data analyses. Furthermore, an extensive UI was created, allowing users to investigate found error code patterns and trace back problems to their origin. With the system, the NS is able to verify and create new hypotheses on possible problematic locations. In this document, the problem in elaborated on, multiple solutions are given of which one is chosen and thoroughly motivated, the solutions are elaborated on and, finally, some recommendations for future expansion are given. ...