Very frustrating experience with Gemini 2.5 function calling performance

The function calling behavior of Gemini models has become completely unreliable and unpredictable:

  • Function calling used to work somewhat reliably, but recently it has stopped working almost entirely.
  • Instead of invoking the function (as verified in the API response), the model simply follows the schema and instructions, generating a plain text response without actually calling the function.
  • Occasionally, with the same query, the model will randomly use function calling again, but the output is often worse—ignoring instructions and schema more than when it just generates a text only response (!)
  • gemini-2.5-flash-preview-04-17 is noticeably better than the production gemini-2.5-flash
    at following instructions and schemas - when it works. The fact that the preview model is better than production raises concerns about stability and release practices.

This erratic behavior makes it impossible to build or trust production systems on top of these APIs.

Ongoing Bug:
The long-standing bug where the model starts its response with a ```json code block—even when explicitly instructed not to—remains unresolved after several months. This forces us to implement unreliable workarounds.

What is the reason for this regression/change of behavior? Will it be fixed?
Will Google ensure that production models match or exceed the reliability and quality of preview versions?

Gemini models are exceptional imo, but we can’t built any serious application if this is the type of performance & reliability we get.

Hello,

Could you please share your code so that we can reproduce your issue and report it to engg team?