I'm personally fine with o3 being tuned on the train set as a way to teach models "the rules of the game", what annoys me is that this wasn't also done with the o1 models or r1. It's a misleading comparison that suggests that o3 is a huge improvement over o1 when in reality much of that improvement may have simply been that one model knew which game it was playing and the others didn't.
I'm personally fine with o3 being tuned on the train set as a way to teach models "the rules of the game", what annoys me is that this wasn't also done with the o1 models or r1. It's a misleading comparison that suggests that o3 is a huge improvement over o1 when in reality much of that improvement may have simply been that one model knew which game it was playing and the others didn't.