I’m running into a series of AI-related issues while building a feature for my Texas Roadhouse menu website that uses Google’s Generative AI APIs. The site relies on semantic search and recommendation features powered by embeddings generated from Gemini models. The first problem is that embedding similarity results have become noticeably inconsistent over the last two weeks. Queries that previously returned highly relevant menu items—like “juicy steak” or “light salad”—are now returning unrelated results or lower-ranked matches. I’m unsure whether the embedding model behavior has changed, or if I’m accidentally processing the text incorrectly before sending it to the API.
Another issue I’m seeing is what feels like “model drift” in the responses. I use Gemini for generating short menu descriptions and categorizing items based on nutrition and ingredients. The same prompt is suddenly producing different-structured responses, sometimes missing fields that my backend expects. For example, the model occasionally skips the “category” or “featured tags” section even though the prompt template hasn’t changed. I’m not sure if this is due to new model versions rolling out or if I should be pinning a specific model explicitly to avoid this variability.
I’m also experiencing random spikes in API latency. Most requests return in under a second, but some take 6–10 seconds, which breaks the smooth browsing experience on my Texas Roadhouse menu pages. The AI-powered parts of the site load recommendations and contextual insights dynamically, so slow API responses cause the page layout to jump or stall. I’m caching aggressively on my end, but the inconsistency makes it hard to maintain predictable performance. I’ve checked my network logs, and nothing indicates packet loss or client-side slowdowns.
Another strange problem is happening with the embeddings endpoint. When I send nearly identical text strings—like two menu descriptions with only minor word differences—the API sometimes generates dramatically different vector magnitudes or embeddings that don’t cluster near each other. This breaks my similarity search logic, which relies on embeddings being stable and consistent. I’ve verified that the request body is identical except for the modified words, so I’m unsure whether this is expected behavior, quantization variance, or something else.
I’m also seeing unexpected token usage patterns. For some menu-generation prompts, the token count reported by the API is higher than expected even though the prompt is static and relatively short. This results in occasional quota overruns during peak traffic when many users view menu pages at the same time. What confuses me is that the exact same request sometimes produces different token counts across calls. I don’t know if this is caused by hidden metadata, internal formatting, or a model update changing tokenization rules.
Overall, I’m trying to determine whether these issues are related to API changes, model versioning, improper request formatting, or something in my embedding storage or caching architecture. The Texas Roadhouse menu website depends heavily on these AI-driven features, so I need stable behavior to keep recommendations and menu search accurate. If anyone has experience with embedding consistency, API latency spikes, or unexpected variation in Gemini responses, I’d appreciate any guidance, debugging steps, or best practices for stabilizing output across model updates. Sorry for the long post!