Why is the multimodal live API so hard to use?

Yes I have seem the demo code and whatnot, but all I want is something in AI studio that I can just copy the code with a simple frontend and a simple backend as a python file. Instead I have to run through 30 different errors just to get something going. It’s just too hard. I’d rather use OpenAI’s realtime API, just seems a lot easier.

I just want a html (with a script tag) file and a python file. That’s all it should need to get something functionable.