Hey,
I tried to migrate from VertexAI to GoogleGenAI and noticed the following:
In my pipeline I am processing pdf documents. The setup of both VertexAI and GoogleGenAI AI are identical, except for slightly different boilerplate setups.
I am using the flash-002 model, inlineData for the PDF, the same prompt + the same response Schema.
The weird part is the token count:
GoogleGenAI with 1 Page PDF: { promptTokenCount: 1271, candidatesTokenCount: 72, totalTokenCount: 1343 }
Google GenAI without the PDF: { promptTokenCount: 91, candidatesTokenCount: 31, totalTokenCount: 122 }
Vertex AI with 1 Page PDF: { promptTokenCount: 398, candidatesTokenCount: 72, totalTokenCount: 470 }
Vertex AI without the PDF: { promptTokenCount: 140, candidatesTokenCount: 43, totalTokenCount: 183 }
Takeaway: While VertexAI seems to respect the 258 token per document, GoogleGenAI does charge ~1200 Token for a single PDF document. Looking for an explanation for this.
I’m trying to find a definitive answer for this - but my first guess is that it sounds like the AI Studio Gemini API turns PDFs into both an image and the text from the PDF, while the Vertex AI Gemini API is just turning it into an image.
type or paste code here
```[quote="Markeem, post:1, topic:41894, full:true"]
Hey,
I tried to migrate from VertexAI to GoogleGenAI and noticed the following:
In my pipeline I am processing pdf documents. The setup of both VertexAI and GoogleGenAI AI are identical, except for slightly different boilerplate setups.
I am using the flash-002 model, inlineData for the PDF, the same prompt + the same response Schema.
The weird part is the token count:
GoogleGenAI with 1 Page PDF: { promptTokenCount: 1271, candidatesTokenCount: 72, totalTokenCount: 1343 }
Google GenAI without the PDF: { promptTokenCount: 91, candidatesTokenCount: 31, totalTokenCount: 122 }
Vertex AI with 1 Page PDF: { promptTokenCount: 398, candidatesTokenCount: 72, totalTokenCount: 470 }
Vertex AI without the PDF: { promptTokenCount: 140, candidatesTokenCount: 43, totalTokenCount: 183 }
Takeaway: While VertexAI seems to respect the 258 token per document, GoogleGenAI does charge ~1200 Token for a single PDF document. Looking for an explanation for this.
[/quote]
[quote="Markeem, post:1, topic:41894, full:true"]
Hey,
I tried to migrate from VertexAI to GoogleGenAI and noticed the following:
In my pipeline I am processing pdf documents. The setup of both VertexAI and GoogleGenAI AI are identical, except for slightly different boilerplate setups.
I am using the flash-002 model, inlineData for the PDF, the same prompt + the same response Schema.
The weird part is the token count:
GoogleGenAI with 1 Page PDF: { promptTokenCount: 1271, candidatesTokenCount: 72, totalTokenCount: 1343 }
Google GenAI without the PDF: { promptTokenCount: 91, candidatesTokenCount: 31, totalTokenCount: 122 }
Vertex AI with 1 Page PDF: { promptTokenCount: 398, candidatesTokenCount: 72, totalTokenCount: 470 }
Vertex AI without the PDF: { promptTokenCount: 140, candidatesTokenCount: 43, totalTokenCount: 183 }
Takeaway: While VertexAI seems to respect the 258 token per document, GoogleGenAI does charge ~1200 Token for a single PDF document. Looking for an explanation for this.
[/quote]
Interesting. I agree the documentation and methods used are inconsistent.
The claimed 258 token per page is at face value very attractive but then elsewhere I read a pdf is split and converted into one image per page which when converted to base64 accounts for quite a lot of tokens e.g. 50 pages pdf, (258x50) vs. 50K tokens or so.
Google is the only one - for now - with multimodal models supporting pdf “natively” which also reduces latency.
Anyhow, in addition to inline passing, it would be interesting to see the tokens used for:
- Gemini GenAI: upload pdf first and refer to it in the content using its file_id
- VertexAI: have the pdf in gcs and refer to it in the content with its file_uri (gs://…)
I wonder what the figures there would be. Maybe a 10 pages pdf is more relevant.
As a side note on your idea to “downgrade” to GenAI, don’t. Vertex API is far more capable, especially if using the JSON mode with response_schema.
GenAI only supports basic properties (type, description, default) whereas the Schema proto of Vertex AI has a lot more (pattern, example, …). The more you can pass in the schema definition the better for an accurate structured response in a single call. Just my 2 cents as you may have other reasons to switch.
Its a PDF with a single page. I did not make that 100% obvious in all parts of my initial post. 258 Token per pdf Page is the expected value clearly stated in the docs.
I think its a big enough issue to get an official answer.