Hacker Newsnew | past | comments | ask | show | jobs | submit | grohan's commentslogin

Bellard has trained various models, so it may not be the specific 169M parameter LLM, but his Transformer-based `nncp` is indeed #1 on the "Large Text Compression Benchmark" [1], which correctly accounts for both the total size of compressed enwik9 + decompresser size (zipped).

There is no unfair advantage here. This was also achieved in the 2019-2021 period; it feels safe to say that Bellard could have likely pushed the frontier far further with modern compute/techniques.

[1] https://www.mattmahoney.net/dc/text.html


Okay, that's a much better claim. nncp has sizes of 15.5MB and 107MB including the decompressor. The one that's linked, ts_zip, has sizes of 13.8MB and 135MB excluding the decompressor. And it's from 2023-2024.


They appear to have Python bindings which seems reasonable from an API / usability perspective? https://github.com/deepseek-ai/smallpond

In terms of fast FUSE - also my first question, appears to be`io_uring` + FUSE :)

https://github.com/deepseek-ai/3FS/blob/main/src/lib/api/Usr...


Impressive numbers. Does anyone have any read or anecdotes on how much a small/mid/large company loses from low quality of software/bad practices (or conversely profits from the opposite)?

Seems like a challenging metric to measure, but always been curious on what the numbers look like.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: