Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my own tests I have found opus to be very good at writing plans, terrible at executing them. It typically ignores half of the constraints. https://x.com/xundecidability/status/2019794391338987906?s=2... https://x.com/xundecidability/status/2024210197959627048?s=2...
 help



1. Don't implement too much at at time

2. Have the agent review if it followed the plan and relevant skills accurately.


the first link was from a simple request with fewer than 1000 tokens total in the context window, just a short shell script.

here is another one which had about 200 tokens and opus decided to change the model name i requested.

https://x.com/xundecidability/status/2005647216741105962?s=2...

opus is bad at instruction following now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: