What is the best model architecture in TensorFlow for extracting text from any image (OCR)?

vani_anandan · June 25, 2025, 10:30am

Hi TensorFlow community,

I’m working on a deep learning project where I want to extract all visible text from any kind of image — including medicine cartons, product labels, and scanned documents.
Objective:
Build a TensorFlow/Keras model that outputs all the text present in an input image.
I’ve tried:

Implemented a CRNN model (CNN + BiLSTM + CTC loss) using TensorFlow and Keras.
Preprocessed images by resizing to fixed height and padded to maintain aspect ratio.
Used a custom character set for CTC decoding.

Are there any better model architectures (like ViT or encoder-decoder transformers) that work well in TensorFlow?

Tips on:
- Preprocessing input images
- Handling different font styles and small text
- Preparing the training dataset
Open-source Keras model references (if available)

Thanks in advance for your guidance!

Topic		Replies	Views
Recommended model architecture for text classification General Discussion help_request	4	449	April 24, 2023
Train OCR Models TensorFlow models , datasets	1	781	February 28, 2024
Subject: Seeking Guidance on Text Understanding and Entity Extraction Using TensorFlow General Discussion models , help_request	3	429	December 11, 2023
Expanding tflite object detector functionality General Discussion models , keras , help_request	5	694	July 12, 2021
How to converting tflite model output float aray to text- Text Recognition General Discussion tflite , help_request	9	1380	July 19, 2021

What is the best model architecture in TensorFlow for extracting text from any image (OCR)?

Related topics