Good question — answering both parts actually clarifies how this class of attack works.
On embedding the key client-side: yes, the key was in client-side code — because it was a Maps key, and for over a decade Google’s own documentation explicitly stated that Maps/AIza keys are not secrets and are meant to be embedded directly in client-side HTML and JavaScript. So the key being public wasn’t a slip on my part; it was the documented, intended usage for that key type. The problem is that Google later made that same public-by-design credential able to authenticate to a billable, server-side AI endpoint once the Generative Language API became enabled on the project — silently, no warning.
On rate limiting in my application: this is the central misconception, and it’s worth understanding. App-level rate limiting wouldn’t have helped at all, because the abuse never passed through my application. Once the key was scraped, the attacker called the Gemini endpoint directly (server-to-server, curl-style), bypassing my front end entirely. My app’s rate limits only govern traffic flowing through my app — they’re invisible to someone hitting generativelanguage.googleapis.com directly with a valid key.
For the same reason, HTTP referrer / IP restrictions offer little protection here: they constrain where a key can be used from, not which APIs it can reach — and a Referer header is trivially spoofable from a server.
The only controls that would actually have stopped this are on Google’s side: (1) API restrictions scoping the key so it can’t call the Generative Language API at all, and (2) a hard spending cap. Neither is on by default — and budgets are alerts, not caps; there’s no native hard limit. That combination — public-by-design key + retroactively-granted AI access + no hard cap — is what turned a scraped Maps key into a five-figure bill.
Happy to share the audit steps I’m using to lock things down, if useful.