Why can't I get Gemini to recognize "strikethrough" text in an image

Ron_Parker · July 5, 2024, 6:55am

This is the image:

This is my prompt:

    # Start the chat session
    chat_session = model.start_chat(
        history=[
            {
                "role": "user",
                "parts": [
                    files[0],
                ],
            },
        ]
    )

    # Send a message and print the response
    response = chat_session.send_message("Extract the text from this file.  Exclude strikethrough words. A strikethrough is a horizontal line drawn through text, used to indicate the deletion of an error or the removal of text.  Do not extract any word formatted with a horizontal line through its center.")
    print(response.text)

Response:

Sick Leave

Modify Article 9 of the Local #161 Motion Picture Theatrical and TV Series Production Agreement (and make conforming changes to Article 41 of the Local #161 Supplemental Digital Agreement) as follows:

"ARTICLE 9. WAIVER OF NEW YORK CITY EARNED SICK TIME ACT AND SIMILAR LAWS SICK LEAVE

"(a) Paid Sick Leave in the State of New York: The following is applicable only to employees working under this Agreement in the State of New York:

"(1) Commencing [insert the date that is the first Sunday after 30 days following the AMPTP’s receipt of notice of ratification], employees shall accrue one (1)

No matter what I try, Flash or Pro model. If it’s multi-modal and looking at this image visually, why can’t it recognize and act on the strikethrough text?

To be clear, this:

"ARTICLE 9. WAIVER OF NEW YORK CITY EARNED SICK TIME ACT AND SIMILAR LAWS SICK LEAVE

Should be returned as this:

"ARTICLE 9. SICK LEAVE

How do I fix this?

POORNIMA · July 5, 2024, 2:35pm

According to me, you cannot fix it. Google has to improve its model.

user113 · July 5, 2024, 3:15pm

If possible, you could change the color of the lines to distinguish them from the text. That might be helpful.

Ron_Parker · July 5, 2024, 3:46pm

Thanks for the suggestion. Not really possible as we are not the source for these PDFs. If we have to go through and manually edit them, sort of defeats the purpose of an automated solution.

Ron_Parker · July 5, 2024, 3:50pm

I’ve said this before: PDFs are the lifeblood of modern day-to-day business activity. What is the business use case of recognizing drawings of cats and dogs and ducks when these multi-modal models can’t even describe what’s clearly present in a business document?

Really frustrating.

POORNIMA · July 6, 2024, 5:44am

100% true fact.

But I wanted to know, have you uploaded pdf or image file?
According to me, there is a separate engine to extract text from pdf. Maybe extraction don’t use AI. I have seen that if I send a image to it, it will first check the image then, even if i instruct something else and say to re analyse the image it won’t. It will stand on the same first result.

Ron_Parker · July 6, 2024, 6:22am

Yes. I have been testing with this image:

And the API absolutely refuses to recognize the strikethrough text. The AI Studio does a better job, but I need this to work in an embedding pipeline, so I’ve got to get the API working.

Diet · July 6, 2024, 11:56pm

It looks like strikethrough in OCR is still a non-trivial issue in 2024.

Anthropic’s models are in the google model garden, if that’s a solution. Google Cloud console (you need to request access though)

It’s not perfect, but sonnet 3.5 gave me this:

[
  {
    "style": "bold",
    "content": "10. Sick Leave"
  },
  {
    "style": "normal",
    "content": "\n\nModify Article 9 of the Local #161 Motion Picture Theatrical and TV Series Production\nAgreement (and make conforming changes to Article 41 of the Local #161 Supplemental\nDigital Agreement) as follows:\n\n"
  },
  {
    "style": "bold",
    "content": "\"ARTICLE 9. "
  },
  {
    "style": "bold-strikethrough",
    "content": "WAIVER OF NEW YORK CITY EARNED SICK TIME ACT\nAND SIMILAR LAWS"
  },
  {
    "style": "bold-underlined",
    "content": " SICK LEAVE"
  },
  {
    "style": "bold",
    "content": "\""
  },
  {
    "style": "normal",
    "content": "\n\n\"(a) Paid Sick Leave in the State of New York: The following is applicable\nonly to employees working under this Agreement in the State of New York:\n\n\"(1) Commencing "
  },
  {
    "style": "italic",
    "content": "[insert the date that is the first Sunday after 30 days\nfollowing the AMPTP's receipt of notice of ratification]"
  },
  {
    "style": "normal",
    "content": ", employees shall accrue one (1)"
  }
]

proomt

Please transcribe the text above, with text blocks filling the following schema:

{
    style: "normal" | "bold" | "underlined" | "strikethrough" | "italic", // you can combine styles with a dash ("style_a-style_b")
    content: string // the actual content
}[]

Only reply with valid JSON. Begin your response with [

.filter(e => !e.style.match("strikethrough"))
.map(e => e.content).join("")

Sick Leave

Modify Article 9 of the Local #161 Motion Picture Theatrical and TV Series Production
Agreement (and make conforming changes to Article 41 of the Local #161 Supplemental
Digital Agreement) as follows:

“ARTICLE 9. SICK LEAVE”

"(a) Paid Sick Leave in the State of New York: The following is applicable
only to employees working under this Agreement in the State of New York:

"(1) Commencing [insert the date that is the first Sunday after 30 days
following the AMPTP’s receipt of notice of ratification], employees shall accrue one (1)

Ron_Parker · July 7, 2024, 1:15am

Diet:

[
  {
    "style": "bold",
    "content": "10. Sick Leave"
  },
  {
    "style": "normal",
    "content": "\n\nModify Article 9 of the Local #161 Motion Picture Theatrical and TV Series Production\nAgreement (and make conforming changes to Article 41 of the Local #161 Supplemental\nDigital Agreement) as follows:\n\n"
  },
  {
    "style": "bold",
    "content": "\"ARTICLE 9. "
  },
  {
    "style": "bold-strikethrough",
    "content": "WAIVER OF NEW YORK CITY EARNED SICK TIME ACT\nAND SIMILAR LAWS"
  },
  {
    "style": "bold-underlined",
    "content": " SICK LEAVE"
  },
  {
    "style": "bold",
    "content": "\""
  },
  {
    "style": "normal",
    "content": "\n\n\"(a) Paid Sick Leave in the State of New York: The following is applicable\nonly to employees working under this Agreement in the State of New York:\n\n\"(1) Commencing "
  },
  {
    "style": "italic",
    "content": "[insert the date that is the first Sunday after 30 days\nfollowing the AMPTP's receipt of notice of ratification]"
  },
  {
    "style": "normal",
    "content": ", employees shall accrue one (1)"
  }
]

Sounds like we’re getting closer. How did you generate the above? Did you manually input it, or was it generated by some code? This is the crucial point for me because I’ll be dealing with hundreds of documents which will have strikethrough text, so I really need to be able to automate the process.

Thank you for the response! I have enabled Claude Sonnet 3.5 (which I didn’t know you could do in Vertex!).

Diet · July 7, 2024, 1:21am

proomt is in the proomt box, just expand it

Ron_Parker · July 7, 2024, 1:50am

I see the prompt. But what I’m not following (my bad, I don’t know the language) is:

a. What is the input text (text above) that the prompt is using? Is this the PDF or image file?
b. How is the following determined?

In other words, I don’t see how the script is determining what is and what isn’t “bold-strikethrough”. I see that the script will match against “strikethrough”, and thus elimanate it from the output – which is what I want.

Unless you are saying that this prompt alone:

Diet:

Please transcribe the text above, with text blocks filling the following schema:

{
    style: "normal" | "bold" | "underlined" | "strikethrough" | "italic", // you can combine styles with a dash ("style_a-style_b")
    content: string // the actual content
}[]

Will return this output:

Diet:

[
  {
    "style": "bold",
    "content": "10. Sick Leave"
  },
  {
    "style": "normal",
    "content": "\n\nModify Article 9 of the Local #161 Motion Picture Theatrical and TV Series Production\nAgreement (and make conforming changes to Article 41 of the Local #161 Supplemental\nDigital Agreement) as follows:\n\n"
  },

In which case, I do understand the concept.

Diet · July 7, 2024, 2:01am

Yeah you got it. that was javascript. you’d send each screen with that prompt, and then post process the data.

eg in py

Ron_Parker · July 7, 2024, 2:05am

Beautiful. Haven’t used javascript in a long minute, but I get the gist of it now. I actually have access to sonnet 3 through AWS, but it’s not offering 3.5. I’ll enable it through Vertex and give it a try. Will let you know.

A glimmer of hope. Many, many thanks.

afirstenberg · July 7, 2024, 2:57am

It isn’t really “requesting”. You need to fill out some information, and you get access once you fill it out. (This is mostly information so Anthropic has an idea how many people are using it.)

Diet · July 7, 2024, 3:08am

Ah. I’ve been reluctant to fill it out because google’s rejected me in the past because my website wasn’t “up to snuff” or something.

Zafaraka_Man · July 8, 2024, 5:04am

нет проблем при загрузке в виде изображение

Ron_Parker · July 8, 2024, 5:20am

Yes, but you are going it in the playground/chatbox. I need to do it in the API. Totally different responses.

Zafaraka_Man · July 8, 2024, 5:31am

нет через апи вот проверти Google AI Gemini API UZB или через тг бот Google AI Gemini API UZB

Ron_Parker · July 8, 2024, 5:36am

Thanks. Hopefully this will provide the solution: Why can't I get Gemini to recognize "strikethrough" text in an image - #12 by Diet

Ron_Parker · July 13, 2024, 5:24am

Thanks @Diet !!

I finally got something working. Yes, using Claude through Anthropic[Vertex] appears to recognize the strikeout.

I am using code similar to this:

import base64
import httpx
from anthropic import AnthropicVertex

LOCATION="europe-west1" # or "us-east5"

client = AnthropicVertex(region=LOCATION, project_id="PROJECT_ID")

image1_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image1_media_type = "image/jpeg"
image1_data = base64.b64encode(httpx.get(image1_url).content).decode("utf-8")

message = client.messages.create(
  max_tokens=1024,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": image1_media_type,
            "data": image1_data,
          },
        },
        {
          "type": "text",
          "text": "Describe this image."
        }
      ],
    }
  ],
  model="claude-3-5-sonnet@20240620",
)
print(message.model_dump_json(indent=2))

Which can be found in this Google Cloud documentation: Google Cloud console

Last question: I need to extract text from multi-page PDFs. So, task 1 is to convert each page to an image. However, I need to know how to send multiple images to Claude in one API call.

Suggestions?

Topic		Replies	Views
Upload PDF to Gemini File API Gemini API gemini-15 , gemini-api	11	1279	February 6, 2025
Cannot get Gemini models to follow prompt instructions Gemini API gemini-15 , prompt	5	354	October 9, 2024
504 Deadline Exceeded - Long Context Google AI Studio gemini-20	1	112	January 10, 2025
Gemini Pro unable to transcribe text in images Community feedback	12	524	May 9, 2024
A Json response should be json parsable Gemini API api	17	1095	July 4, 2024

Why can't I get Gemini to recognize "strikethrough" text in an image

Related topics