The idea is about the LLM trained in a programming language should use as queries the functions, and learn to detect the keys (arguments to the functions) and the value matrix could be related to a memorized version of the computed function with those arguments. So the sequence of learning in transformers is like:
1) query = is this a function?, what function?
2) keys = where are what are their arguments
3) values = embed the memorized result of computing the function with the given arguments.
I wonder if for example a function is an example of a transformer. So the phrase "argument one is cat" and argument two is dog and operation is join so the result is the word catdog is operated by the transformer as the function concat(cat,dog). Here the query is the function and the keys are the argument for the function and the value is a function from word to words.
They can intelligently parse the unstructured input into a structured internal form, apply a transform, and then format the result back into unstructured text. Even the transform itself can be an argument.