I've found that I need to point it to the right bit of logs or test output and n...

I've found that I need to point it to the right bit of logs or test output and narrow its attention by selectively adding to it's context. Claude 3.7 at least works well this way. If you don't it'll fumble around. Gemini hasn't worked as well for me though.

I partly wonder if different peoples prompt styles will lead to better results with different models.