Detecting Fog using Machine Learning and Investigating the Possibilities of Generating Synthetic Data

None, None

Detecting Fog using Machine Learning and Investigating the Possibilities of Generating Synthetic Data

Master Thesis (2023)

Author(s)

J.F.J. Blom (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Robbert Fokkink – Mentor (TU Delft - Applied Probability)

Martin Bastiaan van Gijzen – Mentor (TU Delft - Numerical Analysis)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine learning Deep Learning Neural Network

To reference this document use:

https://resolver.tudelft.nl/uuid:053a52e4-065a-4d60-b600-8d5a721ad9c2

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

17-05-2023

Awarding Institution

Delft University of Technology

Programme

['Applied Mathematics']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Fog plays a major role in chain collisions. Proper fog detection is essential for the Dutch road authority to anticipate foggy weather conditions. Dozens of stations in the Netherlands can measure fog. However, fog can be a very local phenomenon. Therefore, more local measurements are needed. There are about 5,000 traffic cameras in the Netherlands. Several studies on detecting fog on traffic cameras have been done. The most successful studies used machine learning classification models to detect fog. The biggest challenge they face is the extreme imbalance, limited diversity, and limited accuracy of the dataset. Obtaining adequate precision is one of the primary challenges since the extreme imbalance of the dataset significantly impacts precision. The main objective of this research is to improve the dataset and investigate many machine learning configurations. Another objective is to examine the possibilities of generating synthetic data.

This thesis uses a clever (re)labeling method, significantly improving the dataset's quality. However, it turned out that the dataset still has its limitations. A large portion of false positives are caused by labeling errors. After comparing several machine learning models, it follows that a 9-layer ResNET model is optimal. Adding more layers will not result in better performance. Unexpectedly, initializing ResNET with pre-trained weights actually decreases performance. In addition, the effect of oversampling and/or using a weighted binary cross-entropy loss is investigated. Just oversampling leads to overfitting, but using a weighted binary cross-entropy loss isn't ideal either. The best performance is achieved by combining weighted binary cross-entropy loss with oversampling. Decision threshold optimization substantially improved the results. The experiments allowed for selecting the ideal configuration, which substantially increased performance. The best-performing configuration achieved a strong correlation in the Matthews correlation coefficient.

Finally, the possibilities of generating synthetic data are investigated. ADASYN and SMOTe seem attractive at first sight, but from a recent study, it follows that they don't work better than random oversampling. One of the most promising ideas for generating synthetic data is to add fog to clear images. In this thesis, a conceptual algorithm is designed to add artificial fog to clear images. Most generated images look convincing, but there is much room for improvement.

Files

Master_Thesis.pdf

(pdf | 5.92 Mb)

License info not available