Hello. I’ve went through the Gemini API documentation and learned a bit about the image processing support. However, the documentation only explained the single-rounded usage, where the user provides a picture and a question and AI answers it. How to use it in a multi-rounded conversation environment?
For example:
User: text
Gemini: text
User: text + image
Gemini: text
…
How to send a base64 image through the send_message() function?
OK, I found it out myself.
Just use the API like this:
image_path = "https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Palace_of_Westminster_from_the_dome_on_Methodist_Central_Hall.jpg/2560px-Palace_of_Westminster_from_the_dome_on_Methodist_Central_Hall.jpg"
image = httpx.get(image_path)
prompt = "Caption this image."
chat = models[0].start_chat(history=[])
response = chat.send_message([
{'mime_type':'image/jpeg',
'data': base64.b64encode(image.content).decode('utf-8')},
prompt
])
print(response.text)
1 Like
Sure! Here are a few options:
- Thanks for sharing your solution!
- This is helpful, I’ll give it a try.
- Great to see you figured it out!