Hello Google AI Community,
I am visually impaired and I face huge challenges in accessing complex documents like PDFs with tables, math-heavy content, and numbered lists. Standard OCR tools fail to properly read such documents, creating errors in tables, headings, numbering, and mathematical notations. This makes learning and reading extremely difficult for me.
To solve this problem, I started an accessibility project. The idea is: each PDF page is converted to an image and sent to AI for structured extraction. The AI then produces a JSON format containing headings, tables, image descriptions, and other elements. Finally, all data is compiled into a properly formatted Word file that I and other visually impaired users can read. This is not just for me but can help many others who struggle with inaccessible PDFs.
For this, I was using Google Gemini 2.5 Flash model through AI Studio. However, I quickly hit the rate limit. I can only send 10 requests per minute, and my quota has already run out. Even though I use very few requests, the system limits prevent me from continuing. I do not have the money to buy paid API access, and this is a real barrier to completing the project.
I am not asking anyone to build this for me. I am asking for advice or guidance on:
- How to handle or increase API rate limits
- Alternative models or methods suitable for processing images of PDF pages in a structured, accessible format
- Any free resources, tips, or approaches for accessible document processing using AI
This project is extremely important for me and other visually impaired users who struggle with inaccessible PDFs. Your guidance can help make this project viable and truly impactful. I am a student and have limited resources, so any help is immensely appreciated.
Thank you very much for your understanding and support!