Is prompting for code generation not “the more detailed, the better”?

I’m using a prevailing AI coding tool to generate Python code, and I’m running into a weird problem.

My setup:

I ask the AI to read several existing Python files (about 1,500 lines total)

It needs to understand and reuse existing functions in that codebase

I provide a very detailed, pseudo-code like, prompt.

I tried my best to make sure the room of vagueness is minimized. However, the output quality becomes worse, not better. The AI almost never really implement what I asked, even after several iterations. It almost never implement everything that I asked, something is missed here and there. Sometimes it make really stupid mistakes, like using an undefined variable or calling a non-existing function.

Its own explanation is: You provided five highly detailed, interconnected Python files alongside a very strict pseudo-code design. While I can hold all of that in my context window, my “attention mechanism” failed to properly weigh your strict instructions against my generalized programming training.

Though this explanation cannot be treated as the exact cause, I am still surprised. I am reading a lot that engineers at big tech companies are using AI tools to develop and maintain their real world system. It seems that AI should be able to handle my 1500 line lab project easily.

My questions are as following: I definitely don’t want to provide very vague prompt. If prompts are not the more detailed, the better, what is the state-of-the-art balance point? For tasks that require reading an existing codebase and reusing existing functions, what is the best way to structure the working flow?

First time post here. If this is not the right place for this question, let me know where to find the answer. Many many thanks!