GAN Driven Audio Synthesis

On using adversarial training for data driven audio generation

Bachelor thesis (2021)

Authors

V.R. Bockstael Electrical Engineering, Mathematics and Computer Science

Contributors

R. van der Toorn Mathematical Physics - (mentor)

C. Kraaikamp Applied Probability - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:6495fb47-9be8-4096-a0ff-ef08c05cbf2b

Published Date

25-08-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In this study, we investigate the usage of generative adversarial networks for modelling a collection of sounds. The proposed method incites an interpretation of musical sound synthesis based on audio collections rather than synthesizer component controls. This promises the generation of arbitrarily complex sounds without the restrictions of traditional synthesizer components. Furthermore,
the method promises to introduce non-linear interpolations within abritrarily varied collections of sounds. These two elements motivate a new approach in creating musical instruments. Here, we introduce a proof of principle method with qualifications and quantifactions of the results. First, we cover the imagelike audio signal representation and neural network architectures that compose a trainable system capable of producing audio signals. Despite some artifacts, the trained system is able to produce structural similarities in the spectral information compared to the training data set. Furthermore, we introduce a metric to quantitatively compare signal characteristics between two sets of signals. The difference between characteritics appears to decline throughout the training of the system.

Files

060921_VincentBockstaelAudioGA... (.pdf)

(.pdf | 21.1 Mb)