What I usually try to test with is try to get them do full scalable SaaS applica...

Narciss · 2025-11-19T10:45:45 1763549145

It sounds like an API issue more than anything. I was working with it through cursor on a side project, and it did better than all previous models at following instructions, refactoring, and UI-wise it has some crazy skills.

What really impressed me was when I told it that I wanted a particular component’s UI to be cleaned up but I didn’t know how exactly, just wanted to use its deep design expertise to figure it out, and it came up with a UX that I would’ve never thought of and that was amazing.

Another important point is that the error rate for my session yesterday was significantly lower than when I’ve used any other model.

Today I will see how it does when I use it at work, where we have a massive codebase that has particular coding conventions. Curious how it does there.