Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is a paper?


The grandparent is probably talking about the InstructGPT paper? But I don't remember seeing a preference for longer responses in that paper.


I meant the blog post.

https://openai.com/blog/chatgpt/

> The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.12

> Stiennon, Nisan, et al. “Learning to summarize with human feedback.” Advances in Neural Information Processing Systems 33 (2020): 3008-3021. ↩

> Gao, Leo, John Schulman, and Jacob Hilton. “Scaling Laws for Reward Model Overoptimization.” arXiv preprint arXiv:2210.10760 (2022). ↩




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: