I read the code. Guidance seems designed to work well with OpenAI's chat completion API. When you ask Guidance to choose from a set of options, it breaks the list into a tree of tokens and then walks this tree, providing the next set of possible tokens in the logit_bias parameter with value set to +100.
For example, suppose that you specify this as your Guidance "program" and suppose (for sake of simplicity) that the token for "lea" is 1300, the token for "ther" is 1500, and the token for "ves" is 5300:
Guidance will send OpenAI a chat completion starting with
"armor": "
... providing a logit_bias map {"1300": "100"}. This bias forces the model to choose "lea" as the next token. Following this call, we have the prefix
"armor": "lea
... and now Guidance calls chat completion again setting the logit_bias map to {"1500": "100", "5300": "100"} to indicate that the tokens for "ther" or "ves" are equally probable and really the only tokens the model is allowed to select between, unless some other token is maximally probable given the context. OpenAI now replies with token "1500" (let's say) and Guidance completes the string as follows:
"armor": "leather
... because "ther" is represented by token number 1500. Guidance then tacks on the closing quote and other stuff specified by the user:
"armor": "leather",
... and it sets the value of "armor" to "leather" so that you can use that value later in your code if you wish to. Guidance is pretty powerful, but I find the grammar hard to work with. I think the idea of being able to upload a bit of code or a context-free grammar to guide the model is super smart.
Just to +1 mmoskal, while Guidance does somewhat work with OpenAI APIs, AFAIK it was first designed around having direct access to the logits, thus is superior when using local models.
https://github.com/microsoft/guidance