Using gemini-1.5-pro-preview-0514
for Visualizing Files (PDFs/PNGs)
Hi all, I’m currently using gemini-1.5-pro-preview-0514
to visualize files such as PDFs and PNGs in text completions. Here’s how I’m currently doing it:
I’m using Vertex AI and converting local image files (like PDFs and PNGs) to a data URL before sending them. Below is a code snippet of how I achieve this:
def local_image_to_data_url(self, image_path: str) -> bytes:
"""
Converts a local image file to a data URL.
Args:
image_path (str): The path to the local image file.
Returns:
bytes: The data URL of the image.
"""
with open(image_path, "rb") as image_file:
encoded_string: bytes = base64.b64encode(image_file.read()).decode('utf-8') # type: ignore
return encoded_string
For each image file, I check the file type and convert it accordingly:
for image_file in self.image_files:
if image_file.get_filetype() == Filetype.PNG:
mime_type = "image/png"
elif image_file.get_filetype() == Filetype.JPG:
mime_type = "image/jpeg"
elif image_file.get_filetype() == Filetype.TruePDF or image_file.get_filetype() == Filetype.ScanPDF:
mime_type = "application/pdf"
else:
raise ValueError("Unsupported file type")
image_data_url = self.local_image_to_data_url(image_file.get_filepath())
print(f"Image data URL created for {image_file.get_filepath()}")
parts.append(Part.from_data(data=image_data_url, mime_type=mime_type))
Once all parts are created, I generate the content as follows:
print("Parts created for prompt")
parts.append(self.get_prompt(json_data))
responses = model.generate_content(
parts,
generation_config={
"max_output_tokens": 8192,
"temperature": 0.2,
"top_p": 0.95
},
safety_settings={
generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
},
stream=False
)
My Concern
I’m currently sending the files as Base64, but I’m concerned about potentially incurring additional costs.
I believe this is how it’s supposed to be done, as I followed the documentation, but could anyone confirm if I’m using the correct approach?