Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From what I read elsewhere (random reddit comment), the visible reasoning is just "for show" and isn't the process deepseek used to arrive at the result. But if the reasoning has value, I guess it doesn't matter even if it's fake.


Can you provide a link to the comment?

R1's technical report (https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSee...) says the prompt used for training is "<think> reasoning process here </think> <answer> answer here </answer>. User: prompt. Assistant:" This prompt format strongly suggests that the text between <think> is made the "reasoning" and the text between <answer> is made the "answer" in the web app and API (https://api-docs.deepseek.com/guides/reasoning_model). I see no reason why deepseek should not do it this way, if not considering post-generation filtering.

Plus, if you read table 3 of the R1 technical report, which contains an example of R1's chain of thought, its style (going back to re-evaluating the problem) resembles what I actually got in the COT in the web app.


Bad reddit comment though, try pair programming with it. Reasoning usually comments on your request, extends it, figures out which solution is the best and usable, backtracks if finds issues implementing it, proposes a new solution and verifies that it kinda makes sense.

The result after that could actually look different though for usual questions (i.e. summarised in a way chatgpt answers on questions would look like). But it is usually very coherent with the code part, so if for example it has to choose from two libraries - it will use the one from the reasoning part, of course.


That doesn't really make sense with how LLMs work. I think this is exactly why it's risky to use words like "thinking" and "reasoning".

If by the "visible reasoning" is just for show they meant these models don't actually think and reason, then yes that is correct.

But if they meant that the visible reasoning is not quite literally a part of inference process...that's entirely incorrect.

R1 is open source. We don't have to make guesses about its functioning.


There is no way this is true. It is just an example of why Reddit is a fucking joke that you should never read.

I have seen it infer incredibly obscure things in the chain of thought that I was impressed it could piece together.

It is an incredible tool. I would trust it 1000% more than a random person on reddit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: