This sort of measure is a decent match for BPB though. BPB=-log(document_probability)/document_length_bytes and perplexity=e^(BPB*document_length_bytes/document_length_tokens). We already train models by minimizing perplexity, and model outputs are already those that are high probability. Though like with EBMs, figuring out outputs with even higher probability would require an expensive search step.