Neural Network Based Generation of Reactionary Dance Improvisations

"M’AI have this dance?" or "How to train your DRA-GAN"

More Info
expand_more

Abstract

This MSc. thesis presents the extensive research conducted
on designing neural networks capable of generating reactionary dance motions.
The research was conducted in collaboration with Stichting Another kind of Blue
(AKOB) and CompactCopters UG (CC). It is set within the industry context of the
’AI-man’ project, developed by AKOB and CC; a novel dance show intended to
create a truly interactive duet between man and machine. It is the follow-up to
the successful ’Airman’ project; a dance show utilizing a swarm of 12 drones to
perform with a human dancer on stage. Airman enabled the drone swarm to represent
a human figure and copy the improvised dance motions of the human dancer,
recorded via a motion capture (MoCap) system in real-time. The AI-man project
aims to take this to the next level, by placing an AI motion controller in
between the live MoCap recording and the desired humanoid pose passed to the
drone swarm. By doing so, the system should be responsive to the human dancer’s
motion and generate reactionary dance motions to created a shared improvised
duet. Long-term motion generation is a very challenging problem and currently
actively researched by various AI researchers. For reference, many researchers
consider the prediction or generation of motion longer than one second, or even
just 500 ms, ’long-term’ [Ghosh et al., 2018; Gui et al., 2018a; Tang et al.,
2017]. In contrast to this, the interactive segment in the Airman show was
about one minute long. During the initial literature study, no comparable
research was found attempting to generate motion in reaction to full-body human
input. To facilitate the research, a custom MoCap dataset was acquired
containing over 9 hours of improvised dance motions between two dancers. In
comparison to current publicly available datasets, the acquired dataset appears
to be second longest MoCap dataset to date.To achieve the desired objective,
two models were created:1) An initial regression model, successfully mimicking
the human motion in the training dataset, but very unstable in a real-time
environment.2) A generative model, intended to generate a variety of
reactionary motions in real-time, in reaction to an arbitrary motion input. For
the generative model, a novel neural network architecture, for the generation
of long-term reactionary motions, is proposed: The ’Differential Recurrent
Attention GAN’ (DRA-GAN). Utilizing the training methodology of ’Generative
Adversarial Networks’ (GANs) [Goodfellow et al., 2014], in combination with
design elements from ’Recurrent Neural Networks’ (RNNs) [Recurrent units: LSTM
[Hochreiter and Schmidhuber, 1997] & GRU [Cho et al., 2014]], attention
mechanisms [Bahdanau et al., 2015], differential equations and ’Principal
Component Analysis’ (PCA) [Jolliffe, 2002].The proposed model showcases
promising training progression, but has so far failed to generate true longterm
output. This is because the training halts in a common failure mode for GANs;
’mode collapse’ [Liu and Tuzel, 2016]. While the model has so far not succeeded
in generating the desired results, it cannot be concluded that the model is
unable to generate the desired results. This is because, the full potential of
the model has not yet been explored, e.g. by means of ’Bayesian Hyperparameter
Optimization’ (BHPO).The framework to achieve BHPO was developed, but the
extremely long training times (≈ 2 weeks), on the limited hardware available,
have prevented the evaluation of a sufficient number of model variations. To
facilitate further research, an extensive list of methodologies was compiled
that could potentially resolve the current problems of the model.