Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The paper is quite interesting but efficiency on OCR tasks does not mean it could be plugged into a general llm directly without performance loss. If you train a tokenizer only on OCR text you might be able to get better compression already.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: