Hello all. I’ve been utilizing Gemini 2.5 Pro on Google Ai Studio to assist in both editing and brainstorming when I do any creative writing. I’ve found 2.5 Pro more suitable for this task than 3.0 preview, as it is less expensive than 3.0 Pro preview, and it has a much better time in my experience of retaining coherence and memory at larger contexts.
What I’ve done in the past was sending over the necessary information to understand the setting I’m writing in, as well as the chapters I’ve written, after which 2.5 Pro would help me edit what I’ve written and then would work with me to brainstorm potential plot points from there. For a long time this was quite effective, and I was satisfied with the results. However, in the last few weeks (since just after the first few days that Gemini 3.0 Pro Preview was released), that has changed completely. Whenever I reach a context higher than 125k+ (which is naturally going to happen if I’m sending over longform creative writing), the model, 100% of the time (whether I attempt to do so at at ‘peak’ hours or not) gives a 503 ‘model is overloaded’ error. There is no time of day in which this does not occur, and additionally to that, it stops thinking/reasoning consistently at context sizes just below that (75k+ I’d estimate).
A month and a half ago I could reach 300k context without thinking/reasoning breaking down (which is plenty enough, as I’ve almost never reached that), while remaining satisfied with the coherence and memory retention of 2.5 pro. Now, a month and a half later, the model does not work whatsoever for the usecase I have for it. It is ‘overloaded’ regardless of time or circumstance, while immediately working (apparently not being ‘overloaded’) within 30 seconds of telling me it was. With the only change needed being lowering the context sent to the model (which I cannot do while achieving my usecase).
So, I must ask, what gives? Why is the model nonfunctional at less than 15% of it’s advertised maximum context size?