There is a paper?

zone411 · on Jan 18, 2023

The grandparent is probably talking about the InstructGPT paper? But I don't remember seeing a preference for longer responses in that paper.

astrange · on Jan 18, 2023

I meant the blog post.

https://openai.com/blog/chatgpt/

> The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.12

> Stiennon, Nisan, et al. “Learning to summarize with human feedback.” Advances in Neural Information Processing Systems 33 (2020): 3008-3021. ↩

> Gao, Leo, John Schulman, and Jacob Hilton. “Scaling Laws for Reward Model Overoptimization.” arXiv preprint arXiv:2210.10760 (2022). ↩