Why can't I get Gemini to recognize "strikethrough" text in an image

Thanks @Diet !!

I finally got something working. Yes, using Claude through Anthropic[Vertex] appears to recognize the strikeout.

I am using code similar to this:

import base64
import httpx
from anthropic import AnthropicVertex

LOCATION="europe-west1" # or "us-east5"

client = AnthropicVertex(region=LOCATION, project_id="PROJECT_ID")

image1_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image1_media_type = "image/jpeg"
image1_data = base64.b64encode(httpx.get(image1_url).content).decode("utf-8")

message = client.messages.create(
  max_tokens=1024,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": image1_media_type,
            "data": image1_data,
          },
        },
        {
          "type": "text",
          "text": "Describe this image."
        }
      ],
    }
  ],
  model="claude-3-5-sonnet@20240620",
)
print(message.model_dump_json(indent=2))

Which can be found in this Google Cloud documentation: Google Cloud console

Last question: I need to extract text from multi-page PDFs. So, task 1 is to convert each page to an image. However, I need to know how to send multiple images to Claude in one API call.

Suggestions?

1 Like