Z. Wang | TU Delft Repository

Go Deep or Go Home?

Bachelor thesis (2021) - M.C. den Heijer, T.J. Viering, Y. Kato, O.T. Turan, Z. Wang, M. Loog, D.M.J. Tax

Does a convolutional neural network (CNN) always have to be deep to learn a task? This is an important question as deeper networks are generally harder to train. We trained shallow and deep CNNs and evaluated their performance on simple regression tasks, such as computing the mean pixel value of an image. For these simple tasks we show that going deeper does not guarantee an improvement in performance. ...

Is the batch size affecting the performance of Regression CNNs ?

Bachelor thesis (2021) - J.A.D. Lamon, O.T. Turan, M. Loog, D.M.J. Tax, T.J. Viering, Y. Kato, Z. Wang, K.A. Hildebrandt

With an expectation of 8.3 trillion photos stored in 2021 [1], convolutional neural networks (CNN) are beginning to be preeminent in the field of image recognition. However, with this deep neural network (DNN) still being seen as a black box, it is hard to fully employ its capabilities. A need to tune hyperparameters is required to have a robust CNN that can more accurately do its task. In this study, the batch size, being one of the most important hyperparameters, is our main concern. The batch size is the number of samples that will be propagated through the network before updating the weights. Moreover, we show how the batch affects the performance of Regression CNNs to the following regression tasks: the mean, median, standard deviation (std) and variance of the pixel intensities of a grey-scale MNIST [2] input image. This will be analyzed by how well regression CNNs converge, given different batch sizes and a fixed learning rate. Additionally, we will also be comparing the final mean squared error given by all different batch sizes. At the end of the research, our findings concluded that a higher batch size leads to a higher Mean Squared Error (MSE) and a slower convergence. Additionally, the best performance obtained was for batch sizes of size 8 to 32, with slight differences between the four different regressions tasks. ...

Are CNNs that Learn to Predict Image Statistics Invariant to Domain Shifts?

Bachelor thesis (2021) - J.P. Biesheuvel, T.J. Viering, Z. Wang, D.M.J. Tax, M. Loog, K.A. Hildebrandt

Yes, convolutional neural networks are domain-invariant, albeit to some limited extent. We explored the performance impact of domain shift for convolutional neural networks. We did this by designing new synthetic tasks, for which the network’s task was to map images to their mean, median, standard deviation, and variance pixel intensities. We find that the performance drop due to domain shift is related to the shift in pixel values between source and target domain. Colour space transformations seemed to notably impact the network’s performance, opposed to geometric transformations. For the last domain shift we find that the network manages to beat a baseline, from which we can conclude the domain shift is not too severe. Additionally, the findings reveal a less dominant role for feature transferability, for our synthetic regression tasks. ...

Assessment of Parkinson's Disease Severity from Videos using Deep Architectures

Master thesis (2020) - Z. Yin, J.C. van Gemert, Hamdi Dibeklioglu, Huijuan Wang, Ziqi Wang, Victor Geraedts

Parkinson's disease (PD) diagnosis is based on clinical criteria, i.e. bradykinesia, rest tremor, rigidity, etc. Assessment of the severity of PD symptoms, however, is subject to inter-rater variability. In this paper, we propose a deep learning based automatic PD diagnosis method using videos recorded during the assessment with the Movement Disorders Society - Unified PD rating scale (MDS-UPDRS) part III. Seven tasks from the MDS-UPDRS III are investigated, which show the symptoms of bradykinesia and postural tremors. We demonstrate the effectiveness of automatic classification of PD severity using 3D Convolutional Neural Network (CNN) and the PD severity classification can benefit from non-medical datasets for transfer learning. We further design a temporal self-attention (TSA) model to focus on the subtle temporal vision changes in our PD video dataset. The temporal relative self-attention-based 3D CNN classifier gives promising classification results on task-level videos. We also propose a task-assembling method to predict the patient-level severity through stacking classifiers. We show the effectiveness of TSA and task-assembling method on our PD video dataset empirically. ...

Attention-Aware Age-Agnostic Visual Place Recognition

Master thesis (2019) - Jiahui Li, Jan van Gemert, Seyran Khademi, Ziqi Wang, Marcel Reinders, Liangliang Nan

A cross-domain visual place recognition (VPR) task is proposed in this work, i.e., matching images of the same architectures depicted in different domains. VPR is commonly treated as an image retrieval task, where a query image from an unknown location is matched with relevant instances from geo-tagged gallery database. Different from conventional VPR settings where the query images and gallery images come from the same domain, we propose a more common but challenging setup where the query images are collected under a new unseen condition. The two domains involved in this work are contemporary street view images of Amsterdam from the Mapillary dataset (source domain) and historical images of the same city from Beeldbank dataset (target domain). We tailored an age-invariant feature learning CNN that can focus on domain invariant objects and learn to match images based on a weakly supervised ranking loss. We propose an attention aggregation module that is robust to domain discrepancy between the train and the test data. Further, a multi-kernel maximum mean discrepancy (MK-MMD) domain adaptation loss is adopted to improve the cross-domain ranking performance. Both attention and adaptation modules are unsupervised while the ranking loss uses weak supervision. Visual inspection shows that the attention module focuses on built forms while the dramatically changing environment are less weighed. Our proposed CNN achieves state of the art results (99% accuracy) on the single-domain VPR task and 20\% accuracy at its best on the cross-domain VPR task, revealing the difficulty of age-invariant VPR. ...

Black Magic in Deep Learning

Understanding the role of humans in hyperparameter optimization

Master thesis (2019) - Kanav Anand, Jan van Gemert, Marco Loog, Ziqi Wang

Deep learning is proving to be a useful tool in solving problems from various domains. Despite a rich research activity leading to numerous interesting deep learning models, recent large scale studies have shown that with hyperparameter optimization it is hard to distinguish these models based on their final performance. Hyperparameter optimization has shown to improve the state of the art results on several occasions. These results cast the doubts over the performance of these improved deep learning models and lead to the question whether the final performance of a deep learning model is dependent on the person performing the hyperparameter optimization task. A user study was conducted to evaluate the impact of human's prior experience in deep learning on the final performance of a deep learning model. 31 people with different levels of experience in deep learning were invited to perform a hyperparameter optimization task. The collected data was analyzed to find the relationship between human and the final performance of the deep learning model used for the user study. From the results, we observed that the final performance of the model vary with every participant, and a strong correlation between the participant's experience and the final performance achieved. Our data suggest that an experienced participant finds better results using fewer resources. ...