Why can't I get Gemini to recognize "strikethrough" text in an image

OK, so the final, final solution to this was actually to use gpt-4o. What I had to do was create a script that does this:

1. convert local pdf to jpg pages

2. upload jpg images to AWS s3 bucket

3. submit jpg images with prompt to OpenAI model in batches

4. continue processing if max tokens exceeded.

5. write output to local txt file

Since I could not use AnthropicVertex Claude due to the file MB limit (which, actually, my new methodology fixes), I tried using Claude through the AWS Bedrock SDK. The problem there was that version of Claude does not reliably recognize the strikethrough text the same way the AnthropicVertex version does. Go figure.

So, gpt-4o becomes the default go-to model for handling strikethrough text.

Finally, by uploading the images in batches (as little as one image per call), I may extend the time it takes to process a large document, but the token difference is negligible while the efficiency of image processing increases dramatically.

1 Like