Audio-visual authentication for mobile devices

More Info
expand_more

Abstract

Authentication is becoming an increasingly important application in the connected world and is driven by the growing use of mobile and IoT devices that use an increasing number of applications that require transactions of sensitive data. Security usually relies on passwords and/or two-factor authentication which are too intrusive for daily use. Biometric solutions such as fingerprints, voice, iris and retina are a good alternative to overcome previous problems. In this project an audio-visual identity verification is presented, where the use of multiple modes that can already be captured from most IoT devices (microphone and camera) make authentication robust to adverse conditions. End-factor analysis (i-vectors) with cosine distance is implemented as the main classification algorithm which takes into account variations within and between speakers. Mel Frequencies Cepstrum Coefficients (MFCC) are used as audio features, 2D-DCT coefficients of a single snapshot and Motion Vectors (MV) of the lips are extracted for visual features. Improvements combining different modes are shown using VidTimit dataset where the proposed algorithm achieves 0.7% of Half Total Error (HTER) in the test set outperforming single modes audio and visual by 9.5% and 6.4%, respectively.