Latest update broke output limits, confirmed by downgrade

I know this is the fourth or fifth post about output token limits but this one is version-specific and I can reproduce it cleanly.

Latest IDE update: model hits the output token limit on every real coding task. Multi-file implementations, refactors, anything where the model needs to produce substantial work. Simple questions and short edits are fine. I downgraded to the previous version, same tasks complete without hitting the limit. Updated again to confirm, it came back. Downgraded again, gone. This is not a model issue. Same model, same prompts, different IDE version, different result.

The retry behavior makes it worse. When the model gets cut off, it detects the truncation and starts over trying to compress everything into the smaller budget. That second attempt burns through the full context again plus the retry tokens. If this was a deliberate change to reduce token costs, it’s backfiring. Every truncated response that triggers a retry costs significantly more than just letting the original response finish.

I’m currently stuck on the older version because the latest update is unusable for the work I actually need to do.