Extracting Structured Text from Multi-Page Scanned Documents

Hello Community,

I have been assigned a task to extract structured text from scanned documents. The challenge I’m facing is that some sections span multiple pages rather than being confined to a single page.

I attempted to extract text by processing each scanned page as an image and passing it to the gemini-2.0-flash model. However, I couldn’t get continuous text across multiple pages—it gets fragmented.

How can I extract complete text from sections that span multiple pages while maintaining their structure? Any guidance or suggestions would be greatly appreciated!

Thanks in advance!

my code:

class Randbeschreibung(BaseModel):
    von:int
    nach:int
    pos:str


class RandbeschreibungResponse(BaseModel):
    randbeschreibung: list[Randbeschreibung]

def get_randbescheibung_from_images(img):

    prompt = (
    'This is a scanned technical document written in German. It contains sections titled "Randbeschreibung", '
    'which have various subsections such as "Rand der Position x: Pos x" and special cases like "Aussparung".\n\n'
    'Within these sections, the document includes data in the format:\n'
    '- "Gerade von 123 bis 456", which translates to "connected from node 123 to node 456".\n\n'
    '### Task:\n'
    'Extract all occurrences of this "Gerade von X bis Y" pattern from every subsection under "Randbeschreibung", '
    'including "Rand der Position x: Pos x" and "Aussparung", while strictly ignoring any data found under the section '
    '"Fixpunkte, -sterne, -geraden".\n\n'
    '### Expected Output:\n'
    '- Identify the "von" node (start node).\n'
    '- Identify the "bis" node (end node).\n'
    '- Identify the "Pos " Position number (this is like a sub heading. it helps to group the nodes. connecting nodes under pos will create a closed shape).\n'
    '- For an unknown Pos you can enter NA as the position number.\n'
    '- Return a structured list of pairs where each pair represents a connection between two nodes and the Pos number.\n\n'
    '#### Example:\n'
    '**Input:** "Gerade von 123 bis 456"\n\n'
    'Make sure to extract all valid node connections from the relevant sections while strictly adhering to the given constraints.')
    
    response = client.models.generate_content(
        model='gemini-2.0-flash',
        contents=[prompt,img],
        config={
            'response_mime_type': 'application/json',
            'response_schema': RandbeschreibungResponse,
        },
    )
    return(response.text)

Here is my two images (note: they are two seperate images, I joined them using paint for this question):