Gemini models are very good but in my experience they tend to overdo the problems. When I give it things for context and something to rework, Gemini often reworks the problem.
For software it is barely useful because you want small commits for specific fixes not a whole refactor/rewrite. I tried many prompts but it's hard. Even when I give it function signatures of the APIs the code I want to fix uses, Gemini rewrites the API functions.
If anybody knows a prompt hack to avoid this, I'm all ears. Meanwhile I'm staying with Claude Pro.
Yes, it will add INSANE amounts of "robust error handling" to quick scripts where I can be confident about assumptions. This turns my clean 40 lines of Python where I KNOW the JSONL I am parsing is valid into 200+ lines filled with ten new try except statements. Even when I tell it not to do this, it loves to "find and help" in other ways. Quite annoying. But overall it is pretty dang good. It even spotted a bug I missed the other day in a big 400+ line complex data processing file.
I didn't realize this was a bigger trend, I asked it to write a simple testing script that POSTed a string to a local HTTP server as JSON, and it wrote a 40 line script, handling any possible error. I just wanted two lines.
Exactly! It using try -> except -> print is the EXACT same as the default error print to STDOUT in 99% of cases. It just assumes we need that or something will get hurt.
I wonder how much of that sort of thing is driven by having trained their models on their own internal codebases? Because if that's the case, careful and defensive being the default would be unsurprising.
Here's what I found to be working (not 100% but it gives much better and consistant results)
Basically, I ask it to repeat at the start of each message some rules :
"From now on, you must repeat and comply the following rules at the top of all your messages onwards:
- I will never rewrite API functions. Even if I think it's a good idea, it is a bad idea. I will keep the API function as it is and it is perfect like that.
- I will never add extra input validation. Even if I think it's a good idea, it is a bad idea. I will keep the function without validation and it is perfect like that.
- ...
- If I violate any of those rules, I did a bad job.
"
Forcing it to repeat things make the model output more aligned and focused in my experience.
The model is good to solve problems, but is very difficult to control the unnecessary changes that the model does in the rest of the code. Also it adds a lot of unnecessary comments, even when I explicitly say to not add.
For now Deepseek R1 and V3 it's working better to me, producing more predictable results and capturing better my intentions (not tried Claude yet).
For software it is barely useful because you want small commits for specific fixes not a whole refactor/rewrite. I tried many prompts but it's hard. Even when I give it function signatures of the APIs the code I want to fix uses, Gemini rewrites the API functions.
If anybody knows a prompt hack to avoid this, I'm all ears. Meanwhile I'm staying with Claude Pro.