Gemini has been touted as a powerful multimodal model, yet in real-world use cases, it frequently fails at executing structured, rule-based tasks. A prime example is its inability to consistently process bank statements following strict formatting and validation rules.
Test Scenario: Extracting and Structuring Bank Transactions
The task given to Gemini involved extracting transaction data from bank statements (PDFs or images) and formatting them into JSON according to a rigid set of rules. Every transaction needed dual JSON outputs (Version 1 and Version 2) with precise date formatting, amount processing, and transaction categorization.
Key Failures Observed
Forgetting Instructions Midway
Despite explicitly detailed instructions, Gemini often ignored key steps, leading to incomplete or incorrect outputs.
It failed to consistently generate both required JSON versions, sometimes omitting one entirely.
Inconsistent Data Processing
Certain transactions were misclassified (e.g., a debit was marked as a credit).
It occasionally misinterpreted dates, failing to apply the correct YYYY-MM-DD format.
Amounts ending in “.000” (e.g., “40.000”) were sometimes left unchanged, despite clear rules to remove the suffix.
Logical Errors in Transaction Handling
For specific amounts requiring TVA splitting (e.g., 0.595), it sometimes created incorrect JSON structures.
The validation rules were ignored in some cases, leading to missing or misplaced fields in the final JSON.
Inability to Correct Its Own Mistakes
Has anyone else experienced these kinds of issues with Gemini, or is it just me?