Hi, I’m building a project using Gemini’s generateContent API. The API receives a webpage’s content—including text, images, and audio. The webpage provides instructions that need to be solved.
I want to use an API that allows my LLM to also generate charts, interpret audio, and solve problems based on the provided content. Currently, I’m using tool-calling, but it only supports very limited actions.
What API or approach should I use to enable richer capabilities like chart generation and audio understanding?