This is not even remotely close and very silly. A ChainOfThought in a loop. Tree...

COAGULOPATH · on Sept 16, 2024

Came here hoping to find this.

You will not unlock "o1-like" reasoning by making a model think step by step. This is an old trick that people were using on GPT3 in 2020. If it were that simple, it wouldn't have taken OpenAI so long to release it.

Additionally, some of the prompt seems counterproductive:

>Be aware of your limitations as an llm and what you can and cannot do.

The LLM doesn't have a good idea of its limitations (any more than humans do). I expect this will create false refusals, as the model becomes overcautious.

anshumankmr · on Sept 16, 2024

>The LLM doesn't have a good idea of its limitations (any more than humans do). I expect this will create false refusals, as the model becomes overcautious.

Can it not be trained to do so? From my anecdotal observations, the knowledge cutoff is one thing that LLMs are really well trained to know about. Those are limitations that LLMs are currently well trained to handle. Why can it not be trained to know that it is quite frequently bad at math, it may produce sometimes inaccurate code etc.

For humans also, some people know some things are just not their cup of tea. Sure there are times people may have half baked knowledge about things but one can tell if they are good at XYZ things, and not so much at other things.

fudged71 · on Sept 16, 2024

It's a chicken and egg situation. You don't know a model's capabilities until it is trained. When you then change the training with that learning, it will have modified capabilities.

regularfry · on Sept 16, 2024

Apart from anything else there will be a lot of text about the nature of LLMs and their inherent limitations in its training set. It might only need to be made salient the fact that it is one in order to produce the required effect.

whimsicalism · on Sept 16, 2024

you’re wrong and stating things confidently without the evidence to back it up.

alignment is a tough problem and aligning long reasoning sequences to correct answer is also a tough problem. collecting high quality CoT from experts is another tough problem. they started this project in october, more than plausible it could take this time

TrapLord_Rhodo · on Sept 20, 2024

overcautious when trimming branches on the tree seems like a feature, not a bug.

Meganet · on Sept 16, 2024

You actually don't know that.

A LLM has a huge amount of data ingested. It can create character profiles, audience, personas etc.

Why wouldn't it have potentially even learned to 'understand' what 'being aware of your limitations' means?

Right now for me 'change of reasoning' feels a little bit of quering the existing meta space through the reasoning process to adjust weights. Basically priming the model.

I would also not just call it a 'trick'. This looks simple, weird or whatnot but i do believe that this is part of AI thinking process research.

Its a good question though what did they train? New Architecture? More parameters? Is this training a mix of experiments they did? Some auto optimization mechanism?

Hugsun · on Sept 16, 2024

It might understand the concept of it having limitations, but it can't AFAIK reliably recognize when it does or doesn't know something, or has encountered a limitation.

Meganet · on Sept 16, 2024

Its the same thing as with humans, thats right. It doesn't do Logical reasoning but even the best humans stop at some level.

But if you read all the knowledge of humans, were does your reasoning start? Probably at a very high level of it.

If you look at human brains, we conduct experiments right? As a software developer, we write tests. ChatGPT can already run python code and it can write unit tests.

We do not use proofs when we develop. An AI could actually doing this. But at the end its more of a question who does it better, faster and cheaper eh?

Hugsun · on Sept 20, 2024

There is an important difference between humans and LLMs in this context.

Humans do in most cases have some knowledge about why they know the things they know. They can recall the topics they learned at school, and can deduce that they probably heard a given story from a friend who likes to discuss similar topics, etc.

LLMs have no access to the information they were trained on. They could know that everything they know was learned during the training, but they have no way of determining what they learned about and what they didn't.

stevenhuang · on Sept 16, 2024

If you think about it, those criticisms extend to human thinking too. We aren't infallible in all situations either.

It's only when we can interact with the environment to test our hypothesis that we then refine what we know and update our priors appropriately.

If we let LLMs do that as well, by allowing it to run code and interact with documentation/the internet and double-check things its not sure of, it's not out of the question LLMs won't eventually be able to more reliably understand its limitations.

Hugsun · on Sept 20, 2024

As they are currently constructed, I would say that it is out of the question.

Humans usually know (at least roughly) the source of anything they know, as there will be a memory or a known event associated with that knowledge.

LLMs have no analogous way to determine the source of their knowledge. They might know that all their knowledge comes from their training, but it has no way of knowing what was included in the training and what wasn't.

This could maybe be achieved with some more fancy RAG systems, or online training abilities. I think an essential piece is the ability to know the source of information. When LLMs reliably do, and apply that knowledge, they'll be much more useful. Hopefully somebody can achieve this.

cubefox · on Sept 16, 2024

It's interesting that DeepMind still publishes this stuff. OpenAI doesn't publish anything of that sort anymore. DeepMind is more research/publication focused, but this is a disadvantage in a competitive landscape where OpenAI and Anthropic can just apply the results of your paper without giving anything back to the research community.

marricks · on Sept 16, 2024

> but this is a disadvantage in a competitive landscape

Or it's a unique advantage because this stuff doesn't happen without good researches who may want:

1) Their name in scientific papers

2) They might actually care about the openess of AI

cubefox · on Sept 16, 2024

So far it seems to be a disadvantage as DeepMind has fallen behind OpenAI, despite their size, and to some extent even behind Anthropic.

marricks · on Sept 16, 2024

They feel behind because they didn't have the smart guy with a new idea a few years back, and HE decided to work at a place which started as open.

Playing catch up and trying to attract talent from the hot-new-thing OpenAI requires incentives beyond lots of money. I contend actually being open helps.

I'm sure that's one reason Facebook has an open source model, scientists can care about ethics and could be attracted to openness.

michaelt · on Sept 16, 2024

> They feel behind because they didn't have the smart guy with a new idea a few years back, and HE decided to work at a place which started as open.

The "Attention Is All You Need" guys all worked at Google. Google is where they are despite having the smart guys with a new idea a few years back.

Of course, IMHO it wouldn't have have helped Google if they'd kept the transformer architecture secret. They'd have fumbled it because they didn't realise what they had.

zozbot234 · on Sept 16, 2024

Didn't Google have the LaMDA model pretty early, which was even described as "sentient" at some point? That doesn't look "fumbled" to me.

michaelt · on Sept 16, 2024

What Google did was sit on their ass, not deigning to release anything. In the meantime, OpenAI became a $150 billion company. And Anthropic came out with Claude, and Facebook with Llama, and Mistral with their models.

Only then did Google realise there might be something to this LLM stuff - so they responded with Bard, a product so poorly received they later had to completely rebrand it. Looks like they didn't have a "sentient" model up their sleeve after all. Then the updated, rebranded model had a bunch of image generation embarrassments of its own.

Admittedly, they have recovered somewhat since then; they're second on some performance leaderboards, which is respectable.

But there was a real tortoise-and-hare situation where they thought they were so far ahead they had time for a nap, until they got overtaken. Any lead they had from inventing transformers and being the only people with TPUs has been squandered.

cubefox · on Sept 17, 2024

I have the impression they regarded generative AI as too dangerous. Before the success of ChatGPT, they never considered making PaLM or LaMDA or Chinchilla or Imagen publicly available until they saw themselves in a competitive disadvantage.

cabidaher · on Sept 16, 2024

Anthropic publishes quite a lot too though.

cubefox · on Sept 16, 2024

On safety, but no longer on capabilities.

zaptrem · on Sept 16, 2024

Where in their blog post (which seemingly had complete examples of the model’s chain of thought) did they suggest they were using search or tree of thoughts?

Joeri · on Sept 16, 2024

Just a guess:

The chain of thought would be the final path through the tree. Interactively showing the thought tokens would give the game away, which is why they don’t show that.

blackbear_ · on Sept 16, 2024

They mention reinforcement learning, so I guess they used some sort of Monte Carlo tree search (the same algorithm used for AlphaGo).

In this case, the model would explore several chain of thoughts during training, but only output a single chain during inference (as the sibling comment suggests).

whimsicalism · on Sept 16, 2024

as someone who works in this field, this comment is obviously uninformed even about old public research trends

ricardobeat · on Sept 16, 2024

Care to elaborate? Your comment would be a lot more useful if it included a little why. Otherwise it’s just teasing readers and at the same time smearing the author without anything to back it up.

whimsicalism · on Sept 16, 2024

reinforcement learning with ppo doesn’t involve mcts and has been the bread and butter of aligning LLMs since 2020. nothing about saying they use rl implies mcts

janalsncm · on Sept 16, 2024

> nothing about saying they use rl implies they use mcts

We can say the same thing about RL implying PPO, however there’s pretty big hints, namely Noam Brown being involved. Many of the things Noam Brown has worked on involve RL in tree search contexts.

He has also been consistently advocating the use of additional test-time compute to solve search problems. This is also consistent with the messaging regarding the reasoning tokens. There is likely some learned tree search algorithm, such as a learned policy/value function as in AlphaGo.

It’s all speculation until we have an actual paper. So we can’t categorically say MCTS/learned tree search isn’t involved.

whimsicalism · on Sept 16, 2024

nowhere lol

dinobones · on Sept 16, 2024

OAI revealed on Twitter that there is no "system" at inference time, this is just a model.

Did they maybe expand to a tree during training to learn more robust reasoning? Maybe. But it still comes down to a regular transformer model at inference time.

ValentinA23 · on Sept 16, 2024

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

https://arxiv.org/pdf/2403.09629

> In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting – ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions.

>[...]

>We generate thoughts, in parallel, following all tokens in the text (think). The model produces a mixture of its next-token predictions with and without a thought (talk). We apply REINFORCE, as in STaR, to increase the likelihood of thoughts that help the model predict future text while discarding thoughts that make the future text less likely (learn).

quantadev · on Sept 16, 2024

I don't think you can claim you know what's happening internally when OpenAI processes a request. They are a competitive company and will lie for competitive reasons. Most people think Q-Star is doing multiple inferences to accomplish a single task, and that's what all the evidence suggests. Whatever Sam Altman says means absolutely nothing, but I don't think he's claimed they use only a single inference either.

whimsicalism · on Sept 16, 2024

what is “all the evidence”? please share

quantadev · on Sept 16, 2024

I recommend getting on Twitter to follow closely the leading individuals in the field of AI, and also watch the leading Youtube channels dedicated to AI research.

whimsicalism · on Sept 16, 2024

can you link to one speculating about multiple inferences for their CoT? i am curious

e: answer to my own question https://x.com/_xjdr/status/1835352391648158189

quantadev · on Sept 16, 2024

So far it's been unanimous. Everyone I've heard talk about it believes Strawberry is mainly just CoT. I'm not saying they didn't fine tune a model too, I'm just saying I agree with most people that clever CoT is where most of the leap in capability seems to have come from.

whimsicalism · on Sept 16, 2024

> believes Strawberry is mainly just CoT. I'm not saying they didn't fine tune a model too

You don't see the scaling with respect to token length with non-FT'd CoT like this, in my opinion.

quantadev · on Sept 16, 2024

I haven't even added Strawberry support to my app yet, and so haven't checked what it's context length is, but you're right that additional context length is a scaling factor that's totally independent of whether CoT is used or not.

I'm just saying whatever they did in their [new] model, I think they also added CoT on top of it, as the outer layer of the onion so to speak.

pizza · on Sept 16, 2024

Source?

nell · on Sept 16, 2024

> I wouldn't call o1 a "system". It's a model, but unlike previous models, it's trained to generate a very long chain of thought before returning a final answer

https://x.com/polynoamial/status/1834641202215297487

astrange · on Sept 16, 2024

That answer seems to conflict with "in the future we'd like to give users more control over the thinking time".

I've gotten mini to think harder by asking it to, but it didn't make a better answer. Though now I've run out of usage limits for both of them so can't try any more…

qeternity · on Sept 16, 2024

I'm not convinced there isn't more going on behind the scenes but influencing test-time compute via prompt is a pretty universal capability.

whimsicalism · on Sept 16, 2024

not in a way that it is effectively used - in real life all of the papers using CoT compare against a weak baseline and the benefits level off extremely quickly.

nobody except for recent deepmind research has shown test time scaling like o1

bratwurst3000 · on Sept 16, 2024

i am telling claude to give me not the obvious answer. that put thinking time up and the quality of answers is better. hope it helps.

boulos · on Sept 16, 2024

Reminder: you need to escape the * otherwise you end up with emphasis (italics here).

thelastparadise · on Sept 16, 2024

Another serious advantage of a tree search is parallelism.