Synthesizing Comics via Conditional Generative Adversarial Networks

Bachelor Thesis (2021)
Author(s)

D.B. Morris (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Y. Chen – Mentor (TU Delft - Data-Intensive Systems)

Zilong Zhao – Mentor (TU Delft - Data-Intensive Systems)

Arie Van Deursen – Graduation committee member (TU Delft - Software Technology)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Darwin Morris
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Darwin Morris
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The creation of comic illustrations is a complex artistic process resulting in a wide variety of styles, each unique to the artist. Conditional image synthesis refers to the generation of de novo images based on certain preconditions. Applying machine learning to conditionally generate novel comics proves an intriguing yet difficult task. This paper aims to answer whether Generative Adversarial Networks (GANs) can be used for conditional comic synthesis. Recent advancements in Generative Adversarial Networks have increased the capability of image synthesis to hyper-realistic levels. Despite this, the performance of GAN models is almost always assessed on photo-realistic images. To extend experimental knowledge of unconditional GAN performance into the domain of comics, an empirical analysis was performed on the unconditioned generative performance of three cutting edge GAN architectures: Deep Convolutional GAN (DCGAN), Wasserstein GAN (WGAN), and Stability GAN (SGAN). This paper showed that the SGAN implementation far outperforms both the DCGAN and WGAN architectures on a dataset of Dilbert comics, achieving an FID score of 89.1. Due to their relative simplicity, comics provide an intriguing candidate for conditional generation. A comic panel can likely be described using a few specific labels (eg. background and characters). Two conditional networks were created, using the SGAN architecture as a baseline. Multi Class SGAN (MC-SGAN) used a traditional multi-class conditional approach while the Multi Label SGAN (ML-SGAN) utilized a multi-label auxiliary classification approach. Multiple experiments were performed between these two networks resulting in hundreds of hours of training. While performance between the networks was quite similar on simple conditional tasks, on more complex tasks MC-SGAN outperformed ML-SGAN. MC-SGAN was able to conditionally generate comics based on character and color, with desired conditions distinguishable in almost all outputs. Issues with traditional methods of auxiliary classifier training in the MC-SGAN implementation are additionally identified and discussed.

Files

Darwin_Morris_Thesis.pdf
(pdf | 3.95 Mb)
License info not available