Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of Software Engineering (SE) experience and best practices in this field. One such best practice, static code analysis, can be used to fi
...
Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of Software Engineering (SE) experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding standards. This research first set out to measure the prevalence of code smells in ML application projects. However, the results from this study additionally showed deficiencies in the dependency management of these projects, presenting a major threat to their maintainability and reproducibility. Static code analysis practices were also found to be lacking. These issues inspired the novel concept of project smells introduced in this research, which consider the ML project as a whole, including not just the code, but also the data, tools and technologies surrounding it and its development. To help ML practitioners in detecting and mitigating these project smells, as well as to help educate on SE principles, techniques and tools, I developed an open-source static analysis tool mllint using input from experienced ML engineers at the global bank and data-driven organisation ING. This tool was then used to evaluate the concept of project smells and how they fit the industrial context of ING in a second study. This second study also investigated obstructions to implementing best practices recommended by mllint, perceptions on static analysis tools and how ML practitioners perceive the difference in importance of mllint's linting rules (by extension, project smells) for proof-of-concept versus production-ready projects. The results indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development, while requiring minimal configuration effort from the user.