Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Glad someone brought this up.

I'm personally fine with o3 being tuned on the train set as a way to teach models "the rules of the game", what annoys me is that this wasn't also done with the o1 models or r1. It's a misleading comparison that suggests that o3 is a huge improvement over o1 when in reality much of that improvement may have simply been that one model knew which game it was playing and the others didn't.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: