Hello
Should prompt caching impact output generation? (not referring to latency + costs)? If so, how?
I’ve been testing out
- gemini-1.5-pro-001 (with caching)
- gemini-1.5-pro (without caching)
On the ‘same’ inputs. I was wondering whether there is meant to be a large difference in outputs between the two or would the large delta be due to the way I’ve implemented it?
More context:
- gemini-1.5-pro vs gemini-1.5-pro-001 causes the difference in outputs
- prompt caching implementation: (1) I have a lot of pdfs which are already parsed down into JSONs i.e. have metadata + the full text contents; (2) instead of passing it through the ‘contents’ param, I pass it all through the system_instructions param; (3) the input is one large prompt instead of multiple broken down system prompts i.e. one prompt has task, explanation of context given, context, desired structure of output, and further instructions)
More questions:
Q1: What’s the difference (if any) between passing it through the ‘contents’ param vs systems_instruction param?
Q2: What’s the difference (if any) between splitting the prompt (i.e. instructions)/context (i.e. different files) vs passing it all down together in one variable?
Thank you!