I’m very curious what others are experiencing. I’m running a project that initiates around 80 deep research calls per day with a pretty simple prompt, nothing too wild with my input tokens – guess why the cost is? Around $200 a day.
So you’re telling me that each run with search enabled on the agent deep-research-pro-preview-12-2025 is almost $3 ? Sure, it’s doing lots of searches, sourcing data, etc. but jesus christ. That’s insanely expensive. I’ve gone through my code and I’m not seeing any loops or weird errors where it could be running up the bill in the background.
Anyone else have a similar experience?
It would help users immensely if we could see realtime token usage or SOMETHING to see where the issues are. It’s borderline shady how they don’t show you anything but a total charged for the day.
Thanks folks for the feedback and sorry about the delay due to holidays.
As for costs, @Dribgib if you can send me your project ID I can double check and see if what you’re seeing is normal.
It would help users immensely if we could see realtime token usage or SOMETHING to see where the issues are. It’s borderline shady how they don’t show you anything but a total charged for the day.
Thanks for the feedback. We could add something to streaming mode where we can stream usage events for you, would that be useful?
I’ve also tried to use the countTokens API but that doesn’t seem to apply here? How is there not a way to track token usage ?
Correct, countTokens API doesn’t work here as this is an agent, rather than a model and usage is highly dependent on inputs / a specific run so it’s quite hard to estimate.
Usage metadata in final Interaction object should tell you final usage consumption.
I have been having the same problem - calls to the deep-research-agent never contain usage metadata.
Please can you confirm are you referring to usage field in the Interaction object or interaction.complete event during streaming? Both should work, I will investigate regardless but would be good to know whether you’re running issues with both or just one of them.
I’m Product Manager for this feature, I will monitor this thread but also please feel free to reach out to me.
To answer your question: I am referring to the final Interaction object retrieved via the REST API (standard GET request to v1beta/interactions/{id}), not streaming.
I am running the deep-research-pro-preview-12-2025 agent. I have a script that recursively crawls the entire final JSON response looking for usageMetadata (or any key containing “usage” or “token”) at any depth.
It returns null every time.
The final object structure I receive contains only:
name
state (“SUCCEEDED”)
createTime / updateTime
outputs (which contains the content text, but no metadata attached to the candidate)
The usageMetadata block is definitely missing from the payload in the REST response.
I will follow up via email with my Project ID as requested so you can check the logs.
Could definitely be user error here, but it’s certainly costing me a pretty penny in the process
Thanks @Ali_Cevik I am now able to track token usage – extremely helpful!
@Giorgia_Chen I’m unable to find reliable ways to limit token usage right now. My runs (with the same prompts, mind you) consume anywhere from 120K to 500K tokens per run. It’s very inconsistent with consumption – even with the exact same prompt used in back to back tests.
I’ve tried the following in my prompting to gain more control over the agent, none of this has consistently worked:
Limit raw number of search queries
Limit number of sources used in queries
Prevent duplicate queries of the same source
Search “text only” to avoid bloat from media, ads, etc. Hard to say if this works or not.
Specifically ban or include specific search sources
Limit token usage in general (with a recommended amount or hard cap)
Specifically require only 1 search step, 1 analysis step, etc. to try to control the roundtrips it might make.
That being said the agent is very powerful and does a great job at formatting output in a consistent manner. The potential is definitely here, it just needs some guardrails to reduce inflated or unexpected costs. My theory is if it finds inconsistent information it goes on a spending spree to resolve it – which is actually great in theory – but just too expensive for normal use. I’ve spent around $2K just testing the model capabilities… which to me is just too expensive to justify further use until there’s an update regarding some sort of control parameters.