HN
I use the Windsurf IDE, which comes with integrated LLM chat and edit functionality. Ever since I switched to it two months ago and for the three months before that I was using Cursor (similar editor), I have always had better results with Claude.

With the apparently more advanced reasoning models I thought that that would change. In Windsurf I have DeepSeek R1 as well as o3-mini available. I had thought that they would improve my outcomes to the prompts that I'm giving. They did not, far from it. Even though in benchmarks they consistently pull ahead of Claude 3.5 Sonnet, in reality, with the way I am prompting, Claude almost always comes up with the better solution. So much so, that I can't remember a time where Claude couldn't figure it out and then switching to another model fixed it for me.

Because of the discrepancy between benchmarks and my own experience I am wondering if my prompting is off. It may be that I am prompting Claude specific having used it for a while now. Is there a trick to know to prompt the reasoning models "properly"?