Does PDF fine-tuning focus solely on text extraction, or does it also perform visual inference?

Hello, so I am planning to fine tune Gemini 1.5 flash based on PDFs. My question is does it only make inferences on the OCR text it extracts or is it able to identify the structure of the pdf, images and logo present within?