How to make a conversation with Gemini that supports pictures

Hello. I’ve went through the Gemini API documentation and learned a bit about the image processing support. However, the documentation only explained the single-rounded usage, where the user provides a picture and a question and AI answers it. How to use it in a multi-rounded conversation environment?
For example:
User: text
Gemini: text
User: text + image
Gemini: text

How to send a base64 image through the send_message() function?

OK, I found it out myself.
Just use the API like this:

image_path = "https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Palace_of_Westminster_from_the_dome_on_Methodist_Central_Hall.jpg/2560px-Palace_of_Westminster_from_the_dome_on_Methodist_Central_Hall.jpg"

image = httpx.get(image_path)

prompt = "Caption this image."
chat = models[0].start_chat(history=[])
response = chat.send_message([
    {'mime_type':'image/jpeg', 
     'data': base64.b64encode(image.content).decode('utf-8')}, 
    prompt
])

print(response.text)
1 Like

Sure! Here are a few options:

  • Thanks for sharing your solution!
  • This is helpful, I’ll give it a try.
  • Great to see you figured it out!