Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For long context sizes AGI is not useless without vast knowledge. You could always put a bootstrap sequence into the context (think Arecibo Message), followed by your prompt. A general enough reasoner with enough compute should be able to establish the context and reason about your prompt.


Yes, but that just effectively recreates the pretraining. You're going to have to explain everything down to what an atom is, and essentially all human knowledge if you want to have any ability to consider abstract solutions that call on lessons from foreign domains.

There's a reason people with comparable intelligence operate at varying degrees of effectiveness, and it has to do with how knowledgeable they are.


Would that make in-context learning a superset or a subset of pretraining?

This paper claimed transformers learn a gradient-descent mesa-optimizer as part of in-context learning, while being guided by the pretraining objective, and as the parent mentioned, any general reasoner can bootstrap a world model from first principles.

[0] https://arxiv.org/pdf/2212.07677


> Would that make in-context learning a superset or a subset of pretraining?

I guess a superset. But it doesn't really matter either way. Ultimately, there's no useful distinction between pretraining and in-context learning. They're just an artifact of the current technology.


Isn't knowledge of language necessary to decode prompts?


0 1 00 01 10 11 000 001 010 011 100 101 110 111 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110

And no, I don't think the knowledge of language is necessary. To give a concrete example, tokens from TinyStories dataset (the dataset size is ~1GB) are known to be sufficient to bootstrap basic language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: