Why can't I get Gemini to recognize "strikethrough" text in an image

Ron_Parker · August 14, 2024, 3:08am

OK, so the final, final solution to this was actually to use gpt-4o. What I had to do was create a script that does this:

1. convert local pdf to jpg pages

2. upload jpg images to AWS s3 bucket

3. submit jpg images with prompt to OpenAI model in batches

4. continue processing if max tokens exceeded.

5. write output to local txt file

Since I could not use AnthropicVertex Claude due to the file MB limit (which, actually, my new methodology fixes), I tried using Claude through the AWS Bedrock SDK. The problem there was that version of Claude does not reliably recognize the strikethrough text the same way the AnthropicVertex version does. Go figure.

So, gpt-4o becomes the default go-to model for handling strikethrough text.

Finally, by uploading the images in batches (as little as one image per call), I may extend the time it takes to process a large document, but the token difference is negligible while the efficiency of image processing increases dramatically.

Topic		Replies	Views
Upload PDF to Gemini File API Gemini API gemini-15 , gemini-api	11	1304	February 6, 2025
Cannot get Gemini models to follow prompt instructions Gemini API gemini-15 , prompt	5	375	October 9, 2024
504 Deadline Exceeded - Long Context Google AI Studio gemini-20	1	112	January 10, 2025
Gemini Pro unable to transcribe text in images Community feedback	12	543	May 9, 2024
The results from the Gemini 2.0 Flash website don't match the API results for same thing Gemini API api , gemini-flash	5	121	June 19, 2025