Automatic data collection for facial expression recognition

None, None

Automatic data collection for facial expression recognition

Master Thesis (2018)

Author(s)

V. Bollini (TU Delft - Mechanical Engineering)

Contributor(s)

H.S. Hung – Mentor

PP Jonker – Mentor

Cock Heemskerk – Mentor

Faculty

Mechanical Engineering

Copyright

Facial expression Data collection

To reference this document use:

https://resolver.tudelft.nl/uuid:1335643e-6202-4a8c-bd51-9f7657a3611e

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Graduation Date

04-09-2018

Awarding Institution

Delft University of Technology

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Facial expression recognition today is a widely-researched topic with some pertinent applications in human-computer interaction (HCI), surveillance, etc. The diverse range of expressions makes data collection expensive in terms of time and money, making task-specific data collection inconvenient. This thesis project investigates a possible method for quickly and cheaply collecting this data, especially for a Human Computer Interaction (HCI) application. In particular, the focus is on \textit{Pepper}, a robot meant for interacting with humans through conversation. For collecting such data, emotions should be triggered in participants and their faces should be video-recorded. The triggering method used was to show videos selected for triggering specific emotions, letting participants watch them in pairs, thus enabling mutual interactions to better simulate the social environment in which the robot will operate. After watching each video, participants were asked to rate their feelings through the \textit{AffectButton}, a tool for intuitively describing emotions in a dimensional way. Video selection was based on a questionnaire in which people were asked to rate emotions a video was triggering in them. Recordings obtained from people watching the triggering video were then compared to a model of a neutral expression performed by the same participant, in order to select the frames in which expressions were shown, ignoring neutral and transition frames. The pictures obtained were included in a dataset, which is used for training a linear regressor and a Convolutional Neural Network (CNN). These were then tested on naturalistic data taken during conversations, in order to investigate whether the proposed data collection method could build a useful dataset and the results showed this method to be promising.

Files

Bollini_report.pdf

(pdf | 27.6 Mb)

License info not available