GAN Driven Audio Synthesis

On using adversarial training for data driven audio generation

More Info
expand_more

Abstract

In this study, we investigate the usage of generative adversarial networks for modelling a collection of sounds. The proposed method incites an interpretation of musical sound synthesis based on audio collections rather than synthesizer component controls. This promises the generation of arbitrarily complex sounds without the restrictions of traditional synthesizer components. Furthermore,
the method promises to introduce non-linear interpolations within abritrarily varied collections of sounds. These two elements motivate a new approach in creating musical instruments. Here, we introduce a proof of principle method with qualifications and quantifactions of the results. First, we cover the imagelike audio signal representation and neural network architectures that compose a trainable system capable of producing audio signals. Despite some artifacts, the trained system is able to produce structural similarities in the spectral information compared to the training data set. Furthermore, we introduce a metric to quantitatively compare signal characteristics between two sets of signals. The difference between characteritics appears to decline throughout the training of the system.