I used Gemini 2.0 Pro to build a PDF generator that converts JSON data into formatted PDF tables. What made this remarkable was:
The task involved handling 14 different table types and structures, requiring processing of over 2,000 lines of existing code. With just a single prompt - “MAKE ME A TOPDF COMPONENT” - Gemini generated the complete solution.
Due to the model’s output limit of 900 lines, I needed to prompt it to continue, but it maintained consistency and completed the task successfully.
This demonstrates how effectively Gemini 2.0 Pro can handle complex code generation tasks with minimal prompting.
I agree that Gemini 2.0 pro is great but it just isn’t beating deepseek or chatgpt models in reasoning logic, especially when talking about coding. Trust me, I wish it did cause I love Gemini, especially the cost, but it’s just not there yet. Any of the benchmark sites out there say the same, Gemini thinking is around 5th on a good day. I think that will change in time as Google ultimately has a big advantage on infrastructure/data for training but it’s definitely not the new goat (yet)
I agree with you about reasoning capabilities between models, BUT what really amazes me is how Gemini 2 Pro handles long outputs without hallucinating like other AI models do.
Like it can literally output over 3000 lines without making stuff up… that’s just mind blowing.
Some things that blow my mind:
Makes mistakes sometimes but not the annoying ones
Doesn’t randomly do stuff I didn’t ask for
Doesn’t ignore warnings or advice
When fixing issues in massive conversations (60k+ tokens), it actually thinks about the whole picture
Even after 60,000+ tokens it stays focused and understands what you’re saying - no weird confusion or losing track of the conversation
What I love about it isn’t even about being smart - it’s just how it can handle these huge tasks with almost no hallucinations. And don’t even get me started on that gigantic context window…
I’m with you. Nobody can compare to Gemini context windows. It’s 10x over competitors. And yes, I also love Gemini. Have 1000s of hours logged with Gemini, OpenAI, and Llama3.x and just playing with deepseek and overall, Gemini is my go-to for most everything. Just need the reasoning to get a little better. The direct chat interface Google has is superior to the reasoning that the API has for some reason, especially with vision. Not sure why. If they could just close the gap there, there would be no question.
Additional calibration point - there is a huge difference in Gemini v DeepSeek when it comes to physics-based coding . For example, Gemini understood the theory of amplitude-versus-offset (AVO) analysis (used to de-risk a prospect or drilling location) and gave a meaningful / useful response, while DeepSeek and the likes returned codes that ran but had no physics / void of knowledge in AVO. For detail, see:
“Modified by moderator”
Curious as to the root cause of such discrepancy among LLMs - content granularity, training data, subject matters etc., any thoughts?
Gemini’s long context is definitely one of if not the best, but when it comes to debugging hard problems I see me using other Claude, Deepseek or o3-mini ofter, maybe because I just use gemini most of the time so when I don’t get the desired result jump to another quickly.