Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Davidzheng
16 days ago
|
parent
|
context
|
favorite
| on:
The inefficiency of RL, and implications for RLVR ...
a large number of breakthroughs in AI are based on turning unsupervised learning into supervised learning (alphazero style MCTS as policy improvers are also like this). So the confusion is kind of intrinsic.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: