My favorite argument against SP is zero shot translation. The model learns Japanese-English and Swahili-English and then can translate Japanese-Swahili directly. That shows something more than simple pattern matching happens inside.
Besides all arguments based on model capabilities, there is also an argument from usage - LLMs are more like pianos than parrots. People are playing the LLM on the keyboard, making them 'sing'. Pianos don't make music, but musicians with pianos do. Bender and Gebru talk about LLMs as if they work alone, with no human direction. Pianos are also dumb on their own.
The translation happens because of token embeddings. We spent a lot of time developing rich embeddings that capture contextual semantics. Once you learn those, translation is “simply” embedding in one language, and disembedding in another.
This does not show complex thinking behavior, although there are probably better examples. Translation just isn’t really one of them.
> The model learns Japanese-English and Swahili-English and then can translate Japanese-Swahili directly. That shows something more than simple pattern matching happens inside.
The "water story" is a pivotal moment in Helen Keller's life, marking the start of her communication journey. It was during this time that she learned the word "water" by having her hand placed under a running pump while her teacher, Anne Sullivan, finger-spelled the word "w-a-t-e-r" into her other hand. This experience helped Keller realize that words had meaning and could represent objects and concepts.
As the above human experience shows, aligning tokens from different modalities is the first step in doing anything useful.
Besides all arguments based on model capabilities, there is also an argument from usage - LLMs are more like pianos than parrots. People are playing the LLM on the keyboard, making them 'sing'. Pianos don't make music, but musicians with pianos do. Bender and Gebru talk about LLMs as if they work alone, with no human direction. Pianos are also dumb on their own.