They are testing with a different dataset. The authors saying that they have not... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		7thpower 10 months ago \| parent \| context \| favorite \| on: An analysis of DeepSeek's R1-Zero and R1 They are testing with a different dataset. The authors saying that they have not tested on the version of o3 that has not seen the training set.

pertymcpert 10 months ago [–]

Yeah...the whole point is that you're testing the model on something it hasn't seen already. If the problems were in the training set by definition the model has seen them before.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact