Hi TensorFlow community,
I’m working on a deep learning project where I want to extract all visible text from any kind of image — including medicine cartons, product labels, and scanned documents.
Objective:
Build a TensorFlow/Keras model that outputs all the text present in an input image.
I’ve tried:
- Implemented a CRNN model (CNN + BiLSTM + CTC loss) using TensorFlow and Keras.
- Preprocessed images by resizing to fixed height and padded to maintain aspect ratio.
- Used a custom character set for CTC decoding.
Are there any better model architectures (like ViT or encoder-decoder transformers) that work well in TensorFlow?
- Tips on:
- Preprocessing input images
- Handling different font styles and small text
- Preparing the training dataset
- Open-source Keras model references (if available)
Thanks in advance for your guidance!