Small end-to-end OCR model

None, None

Small end-to-end OCR model

Master Thesis (2023)

Author(s)

J. Dun (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Justin Dauwels – Mentor (TU Delft - Signal Processing Systems)

DMJ Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

OCR Mobile device End-to-end

To reference this document use:

https://resolver.tudelft.nl/uuid:34986f5b-3087-41cd-9068-b05084627030

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

27-09-2023

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Circuits and Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Optical Character Recognition (OCR) is a pivotal technology used to extract text information from images, finding wide-ranging applications in document digitization and medical records management. The integration of machine learning has ushered in an era of swift and precise OCR models. Broadly, OCR comprises two key components: detecting the bounding boxes around text instances and recognizing the characters within them. Presently, prevailing OCR models are primarily intricate two-stage systems necessitating real-time operation on remote servers. Nevertheless, end-to-end models exhibit superior performance from a data utilization perspective. There exist scenarios where offline models prove indispensable, such as in environments with restricted internet access or locales with stringent data privacy and security requirements.

This project delves into various end-to-end models, leveraging the PaddleOCR end-to-end model as a foundational reference to devise a compact OCR model tailored for edge devices. Through meticulous optimization of the backbone architecture and the introduction of diverse Feature Pyramid Network (FPN) structures within the stem network, we achieved a remarkable reduction in model size, down to 19MB. This represents a substantial advancement, constituting merely one-tenth of the original PaddleOCR end-to-end model's footprint.

By leveraging an extensive database and conducting a series of fine-tuning experiments specifically tailored for end-to-end OCR tasks involving curved text images, the model exhibits an impressive precision rate of 47.3% and an f-score of 45.3%. This achievement highlights the effectiveness of the customized loss function relative to the original model, despite its reduced size. Notably, this performance is comparable to certain end-to-end models with larger backbones. Furthermore, an Android demo has been carefully developed to demonstrate the model's capabilities on mobile devices, achieving an average processing time of 433 milliseconds per image.

Files

Master_thesis_Jingwen_Dun_5544... (pdf)

(pdf | 14.1 Mb)

License info not available