Print Email Facebook Twitter Augment it Maybe? Title Augment it Maybe?: Improving Deep Vision Models with Adversarial Scene Text Augmentation Author Sharma, Anirvin (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor van Gemert, J.C. (mentor) Cavalcante Siebert, L. (graduation committee) Degree granting institution Delft University of Technology Corporate name Delft University of Technology Programme Computer Science Date 2023-08-31 Abstract Image data augmentation has been regarded as a reliable and effective way to increase the data available for training. With the advent and rise of Generative AI, generative data augmentation has been shown to realize even better gains in performance for downstream tasks. However, these performance gains are often the cause of "extra information" being seeped into the generated examples via pre-trained model weights, heuristic inclusions etc. In this paper, we showcase the impact of text-in-image augmentation on the performance of an underlying downstream task (classification or recognition). This study specifically looks at the difference in performance when training a classifier under three settings- no augmentation, transform-based augmentation, and generative augmentation- and investigate whether and where this augmentation can be successfully employed to experience gains in performance, without letting any "extra information" seep in. We try to observe this difference in performance under varying amounts of training samples, and for samples with varying similarities to that of the original training data. We also present a new GAN structure- conditional Classification Deep Convolutional GAN (or the CcGAN)- as an improved baseline over the conditional Deep Convolutional GAN (cDCGAN) for our experiments which gave a 4\% performance gain over unaugmented data with no 'extra information'. We find that in certain settings and examples, there exists a performance advantage to train vision models in text-in-image settings using real and generated data. We also confirm that the amount of original training samples available affect the test accuracy achieved by generative augmentation, where a huge fall-off can be seen in extremely low- and high- data regimes; however, it seems to maximize performance at a ”sweet spot” where the robustness and variability added by the generated samples help to realize performance gains. It was also observed that the 1x and 5x augmentations performed better than other configurations. Lastly, we find that the similarity of generations does not affect model performance and does not vary consistently with model performance for most settings. Subject Generative Artificial IntelligenceData AugmentationComputer VisionGenerative Adversarial Networks (GAN)Deep LearningCharacter RecognitionImage ClassificationConvolutional Neural Networks (CNNs)MNISTSVHNScene Text Recognition To reference this document use: http://resolver.tudelft.nl/uuid:bcfd0a70-cb58-4252-9ce5-06ce2d9a7e0f Bibliographical note Link to code for this project: http://tinyurl.com/AugmentItMaybe-codebase Part of collection Student theses Document type master thesis Rights © 2023 Anirvin Sharma Files PDF ThesisReport_ASharma.pdf 8.98 MB Close viewer /islandora/object/uuid:bcfd0a70-cb58-4252-9ce5-06ce2d9a7e0f/datastream/OBJ/view