E. Gedik | TU Delft Repository

Multimodal Self-Assessed Personality Estimation during Crowded Mingle Scenarios Using Wearables Devices and Cameras

Journal article (2022) - Laura Cabrera-Quiros, Ekin Gedik, Hayley Hung

This paper focuses on the automatic classification of self-assessed personality traits from the HEXACO inventory during crowded mingle scenarios. These scenarios provide rich study cases for social behavior analysis but are also challenging to analyze automatically as people in them interact dynamically and freely in an in-the-wild face-to-face setting. To do so, we leverage the use of wearable sensors recording acceleration and proximity, and video from overhead cameras. We use 3 different behavioral modality types (movement, speech and proximity) coming from 2 sensors (wearable and camera). Unlike other works, we extract an individual's speaking status from a single body worn triaxial accelerometer instead of audio, which scales easily to large populations. Additionally, we study the effect of different combinations of modality types on the personality estimation, and how this relates to the nature of each trait. We also include an analysis of feature complementarity and an evaluation of feature importance for the classification, showing that combining complementary modality types further improves the classification performance. We estimate the self-assessed personality traits both using a binary classification (community's standard) and as a regression over the trait scores. Finally, we analyze the impact of the accuracy of the speech detection on the overall performance of the personality estimation. ...

Capturing Interaction Quality in Long Duration (Simulated) Space Missions with Wearables

Journal article (2022) - Ekin Gedik, Jeffrey Olenick, Chu-Hsiang Chang, Steve W.J. Kozlowski, Hayley Hung

Space exploration is evolving with the recent increase in interest and investment. For the success of planned long-duration crewed missions, good interpersonal interactions between crew members are crucial. In this study, we evaluate the use of wearables for detection and estimation of the quality of each social interaction participants have throughout a long mission rather than aggregate measures of interactions. Our proposed method utilizes Temporal Convolutional Networks(TCNs) for extracting individual representations from acceleration and audio streams and learnable pooling layers(NetVLAD) to aggregate these representations into fixed-size representations. Use of NetVLAD layers provides an intelligent alternative to simple aggregation for handling variable-sized interactions and interactions with missing data. We evaluate our method on a 4-month simulated space mission where 5 participants wore Sociometric Badges and provided reports on their interactions in terms of effectiveness, frustration, and satisfaction. Our method provides an average ROC-AUC score of 0.64. Since we are not aware of any comparable baselines, we compare our method to hand-crafted features formerly utilized for cohesion estimation in similar scenarios and show it significantly outperforms them. We also present ablation studies where we replace the components in our approach with well-known alternatives and show that they provide better performance than their respective counterparts. ...

Multimodal data collection for social interaction analysis in-the-wild

Conference paper (2019) - Hayley Hung, Chirag Raman, Ekin Gedik, Stephanie Tan, Jose Vargas Quiros

The benefits of exploiting multi-modality in the analysis of human-human social behaviour has been demonstrated widely in the community. An important aspect of this problem is the collection of data-sets that provide a rich and realistic representation of how people actually socialize with each other in real life. These subtle coordination patterns are influenced by individual beliefs, goals, and, desires related to what an individual stands to lose or gain in the activities they perform in their every day life. These conditions cannot be easily replicated in a lab setting and require a radical re-thinking of both how and what to collect. This tutorial provides a guide on how to create such multi-modal multi-sensor data sets when holistically considering the entire experimental design and data collection process. ...

Complex conversational scene analysis using wearable sensors

Book chapter (2019) - Hayley Hung, Ekin Gedik, Laura Cabrera Quiros

When aspiring to achieve 'in the wild' behavior analysis, we come across a number of conceptual and practical issues. In this chapter, we focus primarily on describing the data collection process for the automated analysis of human social behavior. Specifically, we address the task of analyzing social interaction during conversations. Most research in this area has focused largely on seated scenarios such as a small group having a meeting. In this chapter, we address the challenges that are faced when analyzing complex conversational scenes; crowded social settings where mingling occurs such as networking events, cocktail parties or conferences.We discuss and provide definitions of what 'in the wild' means for the context of wearable sensors. We provide a case study detailing different concerns that can emerge as a result of 'in the wild' social behavior analysis. More concretely, we address this in terms of how the concept of ecological validity coming from experimental psychology links with the concept of 'in the wild', practical and conceptual issues related to data collection, and finally how this influences social behavior analysis.Importantly in the presentation of the behavior analysis, we address key questions when an entire dataset is recorded from continuous natural behavior 'in the wild': When do we have enough data? Do we need a different machine learning approach for different amounts of data? Are social behaviors (e.g. speaking) more difficult to characterize than activities (e.g. walking/stepping) when the setting is so uncontrolled? We try to answer this question by considering the extent to which the nature of this problem becomes more personalized or person-independent as the size of the dataset increases. ...

Detecting F-formations Roles in Crowded Social Scenes with Wearables

Combining Proxemics Dynamics using LSTMs

Conference paper (2019) - Alessio Rosatelli, Ekin Gedik, Hayley Hung

In this paper, we investigate the use of proxemics and dynamics for automatically identifying conversing groups, or so-called F-formations. More formally we aim to automatically identify whether wearable sensor data coming from 2 people is indicative of F-formation membership. We also explore the problem of jointly detecting membership and more descriptive information about the pair relating to the role they take in the conversation (i.e. speaker or listener). We jointly model the concepts of proxemics and dynamics using binary proximity and acceleration obtained through a single wearable sensor per person. We test our approaches on the publicly available MatchNMingle dataset which was collected during real-life mingling events. We find out that fusion of these two modalities performs significantly better than them independently, providing an AUC of 0.975 when data from 30-second windows are used. Furthermore, our investigation into roles detection shows that each role pair requires a different time resolution for accurate detection. ...

Capturing human behaviour through wearables by computational analysis of social dynamics

Doctoral thesis (2018) - Ekin Gedik

Understanding human behaviour has sparked the minds of many throughout centuries. One intriguing aspect of human behaviour is the social part; how humans react to each other and their environment. Scientifically studying such behaviour is hampered because of the need for manual annotations, so that social scientists limited themselves to observing only short time intervals in limited settings. With the growing processing power of computers and increasing possibilities of robust, continuous, and mobile sensing, collecting and analysing large amounts of real-life behaviour data has become possible. Moreover, computational methods make it possible to go beyond traditional approaches for social understanding, since they detect patterns that are not easily distinguishable for humans. However, even with powerful computational models, investigating human behaviour is quite challenging as behaviour is personal and contextual, resulting in huge variations. This thesis proposes novel computational solutions for analysing human social behaviour. It focusses on data collected from people with wearable accelerometers in crowded events where people freely mingle with each other. It provides solutions to robustly detect actions and interactions, as well as how to use the detected information to derive higher level social understanding. The thesis starts by introducing novel ways of detecting social actions and interactions. To deal with intra personal variations, we show how general action predictors can be adapted to become personalized models using the transfer learning methodology. Further, we show that the detection of conversing groups can be deduced from interaction dynamics, instead of the mainly preferred modality of proximity. Large variations of interaction patterns that might arise in unrestricted scenarios are addressed by a novel method that considers the sizes of the groups; both in training and detection phases. The thesis continues with a proof-of-concept study that shows how detected action and interaction patterns of people can be used to infer an individuals’ psychological construct. We show that it is possible to detect the construct of personality in a real life event by imitating two behavioural cues (speaking and movement) from one digital modality (acceleration). Additionally, we describe a detailed investigation of how social context moderates an individuals’ evaluation of a live performance. Through a novel approach, we infer audience members’ evaluations from informative parts of the event, identified by the linkage of body accelerations. Taken together, with this thesis we show that with the increased sensing and computing power, the understanding of human social behaviour in more dynamic social situations is within reach. ...

Understanding human behaviour has sparked the minds of many throughout centuries. One intriguing aspect of human behaviour is the social part; how humans react to each other and their environment. Scientifically studying such behaviour is hampered because of the need for manual annotations, so that social scientists limited themselves to observing only short time intervals in limited settings. With the growing processing power of computers and increasing possibilities of robust, continuous, and mobile sensing, collecting and analysing large amounts of real-life behaviour data has become possible. Moreover, computational methods make it possible to go beyond traditional approaches for social understanding, since they detect patterns that are not easily distinguishable for humans. However, even with powerful computational models, investigating human behaviour is quite challenging as behaviour is personal and contextual, resulting in huge variations. This thesis proposes novel computational solutions for analysing human social behaviour. It focusses on data collected from people with wearable accelerometers in crowded events where people freely mingle with each other. It provides solutions to robustly detect actions and interactions, as well as how to use the detected information to derive higher level social understanding. The thesis starts by introducing novel ways of detecting social actions and interactions. To deal with intra personal variations, we show how general action predictors can be adapted to become personalized models using the transfer learning methodology. Further, we show that the detection of conversing groups can be deduced from interaction dynamics, instead of the mainly preferred modality of proximity. Large variations of interaction patterns that might arise in unrestricted scenarios are addressed by a novel method that considers the sizes of the groups; both in training and detection phases. The thesis continues with a proof-of-concept study that shows how detected action and interaction patterns of people can be used to infer an individuals’ psychological construct. We show that it is possible to detect the construct of personality in a real life event by imitating two behavioural cues (speaking and movement) from one digital modality (acceleration). Additionally, we describe a detailed investigation of how social context moderates an individuals’ evaluation of a live performance. Through a novel approach, we infer audience members’ evaluations from informative parts of the event, identified by the linkage of body accelerations. Taken together, with this thesis we show that with the increased sensing and computing power, the understanding of human social behaviour in more dynamic social situations is within reach.

Towards Analyzing and Predicting the Experience of Live Performances with Wearable Sensing

Journal article (2018) - Ekin Gedik, Laura Cabrera-Quiros, Claudio Martella, Gwenn Englebienne, Hayley Hung

We present an approach to interpret the response of audiences to live performances by processing mobile sensor data. We apply our method on three different datasets obtained from three live performances, where each audience member wore a single tri-axial accelerometer and proximity sensor embedded inside a smart sensor pack. Using these sensor data, we developed a novel approach to predict audience members' self-reported experience of the performances in terms of enjoyment, immersion, willingness to recommend the event to others and change in mood. The proposed method uses an unsupervised method to identify informative intervals of the event, using the linkage of the audience members' bodily movements, and uses data from these intervals only to estimate the audience members' experience. We also analyze how the relative location of members of the audience can affect their experience and present an automatic way of recovering neighborhood information based on proximity sensors. We further show that the linkage of the audience members' bodily movements is informative of memorable moments which were later reported by the audience. ...

No-Audio Multimodal Speech Detection in Crowded Social Settings task at MediaEval 2018

Conference paper (2018) - Laura Cabrera-Quiros, Ekin Gedik, Hayley Hung

This overview paper provides a description of the automatic Human Behaviour Analysis (HBA) task for the MediaEval 2018. In its first edition, the HBA task focuses on analyzing one of the most basic elements of social behavior: the estimation of speaking status. Task participants are provided with cropped videos of individuals while interacting freely during a crowded mingle event that was captured by an overhead camera. Each individual is also wearing a badge-like device hung around the neck recording tri-axial acceleration. The goal of this task is to automatically estimate if a person is speaking or not using these two alternative modalities. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system must exploit the natural human movements that accompany speech. The task seeks to achieve competitive estimation performance compared to audio-based systems by exploiting the multi-modal aspects of the problem. ...

The MatchNMingle dataset

A novel multi-sensor resource for the analysis of social interactions and group dynamics in-the-wild during free-standing conversations and speed dates

Journal article (2018) - Laura Cabrera-Quiros, Andrew Demetriou, Ekin Gedik, Leander van der Meij, Hayley Hung

We present MatchNMingle, a novel multimodal/multisensor dataset for the analysis of free-standing conversational groups and speed-dates in-the-wild. MatchNMingle leverages the use of wearable devices and overhead cameras to record social interactions of 92 people during real-life speed-dates, followed by a cocktail party. To our knowledge, MatchNMingle has the largest number of participants, longest recording time and largest set of manual annotations for social actions available in this context in a real-life scenario. It consists of 2 hours of data from wearable acceleration, binary proximity, video, audio, personality surveys, frontal pictures and speed-date responses. Participants' positions and group formations were manually annotated; as were social actions (eg. speaking, hand gesture) for 30 minutes at 20fps making it the first dataset to incorporate the annotation of such cues in this context. We present an empirical analysis of the performance of crowdsourcing workers against trained annotators in simple and complex annotation tasks, founding that although efficient for simple tasks, using crowdsourcing workers for more complex tasks like social action annotation led to additional overhead and poor inter-annotator agreement compared to trained annotators (differences up to 0.4 in Fleiss' Kappa coefficients). We also provide example experiments of how MatchNMingle can be used. ...

Personalised models for speech detection from body movements using transductive parameter transfer

Journal article (2017) - Ekin Gedik, Hayley Hung

We investigate the task of detecting speakers in crowded environments using a single body worn triaxial accelerometer. Detection of such behaviour is very challenging to model as people’s body movements during speech vary greatly. Similar to previous studies, by assuming that body movements are indicative of speech, we show experimentally, on a real-world dataset of 3 h including 18 people, that transductive parameter transfer learning (Zen et al. in Proceedings of the 16th international conference on multimodal interaction. ACM, 2014) can better model individual differences in speaking behaviour, significantly improving on the state-of-the-art performance. We also discuss the challenges introduced by the in-the-wild nature of our dataset and experimentally show how they affect detection performance. We strengthen the need for an adaptive approach by comparing the speech detection problem to a more traditional activity (i.e. walking). We provide an analysis of the transfer by considering different source sets which provides a deeper investigation of the nature of both speech and body movements, in the context of transfer learning. ...

Estimating self-assessed personality from body movements and proximity in crowded mingling scenarios

Conference paper (2016) - Laura Cabrera Quiros, Ekin Gedik, Hayley Hung

This paper focuses on the automatic classification of self-assessed personality traits from the HEXACO inventory during crowded mingle scenarios. We exploit acceleration and proximity data from a wearable device hung around the neck. Unlike most state-of-the-art studies, addressing personality estimation during mingle scenarios provides a challenging social context as people interact dynamically and freely in a face-to-face setting. While many former studies use audio to extract speech-related features, we present a novel method of extracting an individual’s speaking status from a single body worn triaxial accelerometer which scales easily to large populations. Moreover, by fusing both speech and movement energy related cues from just acceleration, our experimental results show improvements on the estimation of Humility over features extracted from a single behavioral modality. We validated our method on 71 participants where we obtained an accuracy of 69% for Honesty, Conscientiousness and Openness to Experience. To our knowledge, this is the largest validation of personality estimation carried out in such a social context with simple wearable sensors. ...

Speaking Status Detection from Body Movements Using Transductive Parameter Transfer

Conference paper (2016) - Ekin Gedik, Hayley Hung

We investigate the task of detecting speakers in crowded environments using a single triaxial accelerometer worn around the neck. Similar to the previous studies, by assuming that body movements are indicative of speech, we show experimentally that transductive transfer learning can better model individual differences in speaking behaviour compared to a traditional person independent setup. Such behaviour is very challenging to model as people’s body movements during speech vary greatly. To our knowledge, this is the first time that a transfer learning approach has been considered in the context of speaking status detection using a single body worn accelerometer. We show that by transferring knowledge across subjects, competitive performance scores compared to a person dependent training can be obtained.

...

Are You (Not) Entertained?

Estimating the State of a Crowd in an Event Using Wearable Sensors

Conference paper (2016) - Ekin Gedik

...