Pipeline construction for the automated text retrieval, editing, and deletion in comic illustrations

More Info
expand_more

Abstract

With the increasing demand for high- quality data in the field of Machine Learning and AI, the availability of such data has become a major bottleneck for further advancements. This paper proposes a novel approach to extract valuable data from comic illustrations, aiming to address the scarcity of labeled datasets. By leveraging popular comic series such as Dilbert, which contain thousands of comic strips with multiple panels, text boxes, characters, and settings, we aim to create a pipeline for data labeling and manipulation. This pipeline will enable experiments in various areas, including generative comics, humor detection, translation, and more. The paper focuses on two key research questions: 1) How accurately can we get current OCR models to extract text from the comics, and 2) How can we create the ability to edit and delete existing text boxes. By accurately segmenting the panels and text boxes within the comics, we expect to improve OCR performance by reducing noise and addressing unique text formats. Object detection models will be employed to zoom into text boxes, further enhancing OCR text extraction accuracy. Evaluation metrics such as Latent Dirichlet allocation (LDA), Character Error Rate (CER), and Word Error Rate (WER) will be used to measure the effectiveness of the proposed techniques. In the end, utilizing a dataset of 500 labelled comic panels, we achieve accuracies of 94.07% for CER (up 9.86% from 84.21% baseline), 88.35% for WER (up 8.35% from 80.0% baseline), and 98.0% for LDA (up 3.77% from 94.23% baseline). Similarly, editing and deleting of text boxes inside the comic panels prove to be successful in a vast majority of instances. We believe these results are more than adequate for select use cases.