Disable thinking for Gemma 4

letanloc · April 9, 2026, 12:40pm

Hi everyone,

I use Gemma 4 via API, I want to disable thinking (because it’s very slow, it spends more than 60s for a simple question) but I cannot find any document to do it.

Does anyone know how to disable thinking?
Please help me, many thanks.

Pannaga_J · April 14, 2026, 11:57am

Hi @letanloc
Could you confirm which API you’re using? Typically, ‘thinking’ is activated by adding the <|think|> token to the beginning of the system prompt. To turn it off, simply remove that token. Alternatively, if you are running Gemma 4 on Cloud Run, you can disable this feature by setting "enable_thinking": False in your configuration.

Thanks

Greg_Obleshchuk · April 20, 2026, 4:15am

Hi Pannaga,
I know the doco say that but I’m not 100% sure it is. I’m having the same issue

schlober · April 21, 2026, 11:08am

Try adding “include_thoughts”: false to your config. For example:

{
“contents”: [
{
“role”: “user”,
“parts”: [
{
“text”: “Who are you?”
}
]
}
],
“generationConfig”: {
“maxOutputTokens”: 4048,
“include_thoughts”: false
}
}

Greg_Obleshchuk · April 22, 2026, 1:36pm

didn’t work
[

{

    "error": {

        "code": 400,

        "message": "Invalid JSON payload received. Unknown name \\"include_thoughts\\" at 'generation_config': Cannot find field.",

        "status": "INVALID_ARGUMENT",

        "details": \[

            {

                "@type": "type.googleapis.com/google.rpc.BadRequest",

                "fieldViolations": \[

                    {

                        "field": "generation_config",

                        "description": "Invalid JSON payload received. Unknown name \\"include_thoughts\\" at 'generation_config': Cannot find field."

                    }

                \]

            }

        \]

    }

}

]

Pannaga_J · April 27, 2026, 6:41am

Hi @Greg_Obleshchuk
Apologies for late response .
Could you try setting “thinkingLevel”: “MINIMAL” in your request? I used this setting in my test request, for example, and noticed that “thoughtsTokenCount” was not utilized.
Request : curl “https://generativelanguage.googleapis.com/v1beta/models/gemma-4-26b-a4b-it:generateContent?key= your key here” -H “Content-Type: application/json” -X POST -d ‘{“contents”: [{“parts”:[{“text”: “What is the water formula?”}]}],“generationConfig”: {“thinkingConfig”: {“thinkingLevel”: “MINIMAL”}}}’

I got this response back
Response :
{

“candidates”: [

{

  "content": {

    "parts": \[

      {

        "text": "",

        "thought": true

      },

      {

        "text": "The chemical formula for water is \*\*$\\\\text{H}\_2\\\\text{O}$\*\*.\\n\\nHere is a breakdown of what that means:\\n\\n\*   \*\*$\\\\text{H}\_2$\*\*: This indicates there are \*\*two atoms of Hydrogen\*\*.\\n\*   \*\*$\\\\text{O}$\*\*: This indicates there is \*\*one atom of Oxygen\*\*.\\n\\nIn a single molecule of water, these three atoms are held together by \*\*covalent bonds\*\*, where the oxygen atom shares electrons with the two hydrogen atoms."

      }

    \],

    "role": "model"

  },

  "finishReason": "STOP",

  "index": 0

}

],
“usageMetadata”: {

"promptTokenCount": 6,

"candidatesTokenCount": 103,

"totalTokenCount": 109,

"promptTokensDetails": \[

  {

    "modality": "TEXT",

    "tokenCount": 6

  }

\]

},

“modelVersion”: “gemma-4-26b-a4b-it”,

“responseId”: “_APvaeT9OduKjuMPmtyt6As”

}

Please try this and let me know if it works.
Thanks

Greg_Obleshchuk · April 28, 2026, 2:54am

Hi ,

Thanks for the reply.
I did try “thinkingConfig”: {“thinkingLevel”: “MINIMAL”}, and I agree ther is no thoughtsTokenCount fileds being returned.

I just didn’t know if this meant it wasn’t using thinking mode or not. THe documentation isn’t really clear at all. I would have imaged that the values woudl have been “thinkingConfig”: {“thinkingLevel”: “NONE”} . That would have been clearer.

I will continue to use “thinkingConfig”: {“thinkingLevel”: “MINIMAL”} .

once again thanks

Greg

Clintin_Brummer1 · May 4, 2026, 4:34pm

Issue: “Gemma is returning its thinking” / “Disable thinking for Gemma 4”

The Problem: Users are seeing the model’s internal reasoning or “thought process” output directly into the final text generation.

The Fix: Reasoning models often output their logic block before the final answer, usually enclosed in specific tags (like and ).

Application-Side Parser: The most robust fix is to implement a regex or string-parsing function in the client code to automatically strip out any text between these tags before rendering the output to the end user.
API Flags: Check the specific endpoint documentation (Vertex AI or AI Studio) for generation configuration parameters. Some endpoints allow passing a flag such as include_thinking=false in the JSON payload to suppress the reasoning tokens at the server level.

2deep4u · May 8, 2026, 7:06am

its part of its normal generation.
But you can offer it an alternative. To poop it in a {“quick_thinking”:" "} structure.

Kushal_Roy · May 10, 2026, 9:29pm

use ThinkingLevel.MINIMAL. For gemma 4 it turns off thinking.

Topic		Replies	Views
Can't turn off thinking mode using Gemma4 Gemini API thinking	0	193	April 20, 2026
Guys i need help, gemma is returning its thinking Gemma models	4	116	May 11, 2026
Gemini-2.5-flash-preview-04-17 not honoring thinking_budget=0 Gemini API help_request	5	1727	April 22, 2025
How to Reduce Thought Reasoning in Gemini 2.5 Pro Gemini API api , models	7	2807	June 9, 2025
Gemini 2.5 Flash problems while trying to deactivate thinking Gemini API models , thinking	3	677	August 18, 2025

Disable thinking for Gemma 4

Issue: “Gemma is returning its thinking” / “Disable thinking for Gemma 4”

Related topics