A.A. Gudi
Please Note
5 records found
1
Less Machine (=) More Vision
Approaches towards Practical and Efficient Machine Vision with Applications in Face Analysis
This dissertation tackles annotation effort by exploring how weakly-supervised object/person detectors can be improved. Findings show that prior knowledge about objects' bounds in images helps the detector learn the spatial extent of objects using only weak image-level labels. The proposed implementation enables single-shot detection, thus improving computational efficiency of this data-efficient method.
The thesis also demonstrates how prior knowledge about eye locations can be used to reduce the computational burden of gaze tracking: non-vital parts of the input image can be discarded without losing accuracy. Additionally, the thesis finds how a priori known geometrical relations can be exploited to project gaze onto a screen with little human annotation effort.
Findings of this dissertation further suggest that spatial structures in images can be exploited for improving efficiency of vision tasks. The proposed solution allows for learning detection of facial occlusions and anomalies from only a few examples. Results also indicate that this solution can be used as a loss function for unsupervised pre-training of neural networks when resources are constrained.
Lastly, this thesis showcases how prior know-how about blood-flow physiology in faces can be applied in a camera-based vital signs estimator. Even when data is available, this hand-crafted method performs better than deep learning methods — both in terms of accuracy and efficiency. At the same time, the results also reveal the pitfalls of assumptions made in the prior knowledge when exposed to more complex tasks — such as video compression noise filtering.
Through its common theme of incorporating prior knowledge, this dissertation brings attention to the costs incurred by machine vision systems to achieve high accuracy. ...
This dissertation tackles annotation effort by exploring how weakly-supervised object/person detectors can be improved. Findings show that prior knowledge about objects' bounds in images helps the detector learn the spatial extent of objects using only weak image-level labels. The proposed implementation enables single-shot detection, thus improving computational efficiency of this data-efficient method.
The thesis also demonstrates how prior knowledge about eye locations can be used to reduce the computational burden of gaze tracking: non-vital parts of the input image can be discarded without losing accuracy. Additionally, the thesis finds how a priori known geometrical relations can be exploited to project gaze onto a screen with little human annotation effort.
Findings of this dissertation further suggest that spatial structures in images can be exploited for improving efficiency of vision tasks. The proposed solution allows for learning detection of facial occlusions and anomalies from only a few examples. Results also indicate that this solution can be used as a loss function for unsupervised pre-training of neural networks when resources are constrained.
Lastly, this thesis showcases how prior know-how about blood-flow physiology in faces can be applied in a camera-based vital signs estimator. Even when data is available, this hand-crafted method performs better than deep learning methods — both in terms of accuracy and efficiency. At the same time, the results also reveal the pitfalls of assumptions made in the prior knowledge when exposed to more complex tasks — such as video compression noise filtering.
Through its common theme of incorporating prior knowledge, this dissertation brings attention to the costs incurred by machine vision systems to achieve high accuracy.
In the face of scarcity in detailed training annotations, the ability to perform object localization tasks in real-time with weak-supervision is very valuable. However, the computational cost of generating and evaluating region proposals is heavy. We adapt the concept of Class Activation Maps (CAM) [28] into the very first weakly-supervised ‘single-shot’ detector that does not require the use of region proposals. To facilitate this, we propose a novel global pooling technique called Spatial Pyramid Averaged Max (SPAM) pooling for training this CAM-based network for object extent localisation with only weak image-level supervision. We show this global pooling layer possesses a near ideal flow of gradients for extent localization, that offers a good trade-off between the extremes of max and average pooling. Our approach only requires a single network pass and uses a fast-backprojection technique, completely omitting any region proposal steps. To the best of our knowledge, this is the first approach to do so. Due to this, we are able to perform inference in real-time at 35fps, which is an order of magnitude faster than all previous weakly supervised object localization frameworks.