Audio-visual authentication for mobile devices

Montesinos García, L.I.

Audio-visual authentication for mobile devices

Master thesis (2018)

Authors

L.I. Montesinos García Electrical Engineering, Mathematics and Computer Science

Contributors

R. Heusdens (mentor)

Nikolay D. Gaubitch (graduation committee member)

Andreas I. Koutrouvelis (graduation committee member)

David M. J. Tax (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Machine Learning Computer Vision Authentication Classification Verification Audio Classification Signal Detection Theory

To reference this document use:

http://resolver.tudelft.nl/uuid:0e8a2a7f-d188-47c9-9d7c-5721ac3056a1

More Info

expand_more

Published Date

29-08-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Authentication is becoming an increasingly important application in the connected world and is driven by the growing use of mobile and IoT devices that use an increasing number of applications that require transactions of sensitive data. Security usually relies on passwords and/or two-factor authentication which are too intrusive for daily use. Biometric solutions such as fingerprints, voice, iris and retina are a good alternative to overcome previous problems. In this project an audio-visual identity verification is presented, where the use of multiple modes that can already be captured from most IoT devices (microphone and camera) make authentication robust to adverse conditions. End-factor analysis (i-vectors) with cosine distance is implemented as the main classification algorithm which takes into account variations within and between speakers. Mel Frequencies Cepstrum Coefficients (MFCC) are used as audio features, 2D-DCT coefficients of a single snapshot and Motion Vectors (MV) of the lips are extracted for visual features. Improvements combining different modes are shown using VidTimit dataset where the proposed algorithm achieves 0.7% of Half Total Error (HTER) in the test set outperforming single modes audio and visual by 9.5% and 6.4%, respectively.

Files

Thesis_LMontesinos.pdf

(pdf | 5.66 Mb)

- Embargo expired in 31-10-2018

License info not available