Extending LLM Functionality for Charts, Audio Interpretation, and Webpage Analysis

Hi, I’m building a project using Gemini’s generateContent API. The API receives a webpage’s content—including text, images, and audio. The webpage provides instructions that need to be solved.

I want to use an API that allows my LLM to also generate charts, interpret audio, and solve problems based on the provided content. Currently, I’m using tool-calling, but it only supports very limited actions.

What API or approach should I use to enable richer capabilities like chart generation and audio understanding?