More

leiserfg · 2025-09-25T10:00:12 1758794412

It is, the link is in the readme.

leiserfg · 2025-05-25T05:51:24 1748152284

If binary, consider just using SQLite.

paulddraper · 2025-05-25T20:39:43 1748205583

Did you read the article?

That wouldn’t support partial parsing.

HelloNurse · 2025-05-26T08:54:57 1748249697

On the contrary, loading everything from a database is the limit case of "partial parsing" with queries that read only a few pages of a few tables and indices.

From the point of view of the article, a SQLite file is similar to a chunked file format: the compact directory of what tables etc. it contains is more heavyweight than listing chunk names and lengths/offsets, but at least as fast, and loading only needed portions of the file is automatically managed.

lifthrasiir · 2025-05-25T05:55:42 1748152542

Using SQLite as a container format is only beneficial when the file format itself is a composite, like word processor files which will include both the textual data and any attachments. SQLite is just a hinderance otherwise, like image file formats or archival/compressed file formats [1].

[1] SQLite's own sqlar format is a bad idea for this reason.

SyrupThinker · 2025-05-25T06:48:58 1748155738

From my own experience SQLite works just fine as the container for an archive format.

It ends up having some overhead compared to established ones, but the ability to query over the attributes of 10000s of files is pretty nice, and definitely faster than the worst case of tar.

My archiver could even keep up with 7z in some cases (for size and access speed).

Implementing it is also not particularly tricky, and SQLite even allows streaming the blobs.

Making readers for such a format seems more accessible to me.

lifthrasiir · 2025-05-25T07:06:34 1748156794

SQLite format itself is not very simple, because it is a database file format in its heart. By using SQLite you are unknowingly constraining your use case; for example you can indeed stream BLOBs, but you can't randomly access BLOBs because the SQLite format puts a large BLOB into pages in a linked list, at least when I checked last. And BLOBs are limited in size anyway (4GB AFAIK) so streaming itself might not be that useful. The use of SQLite also means that you have to bring SQLite into your code base, and SQLite is not very small if you are just using it as a container.

> My archiver could even keep up with 7z in some cases (for size and access speed).

7z might feel slow because it enables solid compression by default, which trades decompression speed with compression ratio. I can't imagine 7z having a similar compression ratio with correct options though, was your input incompressible?

SyrupThinker · 2025-05-26T17:05:45 1748279145

Yes, the limits are important to keep in mind, I should have contextualized that before.

For my case it happened to work out because it was a CDC based deduplicating format that compressed batches of chunks. Lots of flexibility with working within the limits given that.

The primary goal here was also making the reader as simple as possible whilst still having decent performance.

I think my workload is very unfair towards (typical) compressing archivers: small incremental additions, needs random access, indeed frequent incompressible files, at least if seen in isolation.

I've really brought up 7z because it is good at what it does, it is just (ironically) too flexible for what was needed. There probably some way of getting it to perform way better here.

zpack is probably a better comparison in terms of functionality, but I didn't want to assume familiarity with that one. (Also I can't really keep up with it, my solution is not tweaked to that level, even ignoring the SQLite overhead)

duskwuff · 2025-05-26T03:30:37 1748230237

BLOBs support random access - the handles aren't stateful. https://www.sqlite.org/c3ref/blob_read.html

You're right that their size is limited, though, and it's actually worse than you even thought (1 GB).

lifthrasiir · 2025-05-26T05:04:31 1748235871

My statement wasn't precise enough, you are correct that random access API is provided. But it is ultimately connected to the `accessPayload` function in btree.c which comment mentions that:

    ** The content being read or written might appear on the main page
    ** or be scattered out on multiple overflow pages.

In the other words, the API can read from multiple scattered pages unknowingly to the caller. That said I see this can be considered enough for being random accessible, as the underlying file system would use similarly structured indices behind the scene anyway... (But modern file systems do have consecutively allocated pages for performance.)

duskwuff · 2025-05-26T03:27:45 1748230065

One gotcha to be aware of is that SQLite blobs can't exceed 1* GB. Don't use SQLite archives for large monolithic data.

*: A few bytes less, actually; the 1 GB limit is on the total size of a row, including its ID and any other columns you've included.

InsideOutSanta · 2025-05-25T07:34:37 1748158477

The Mac image editor Acorn uses SQLite as its file format. It's described here:

https://shapeof.com/archives/2025/4/acorn_file_format.html

The author notes that an advantage is that other programs can easily read the file format and extract information from it.

lifthrasiir · 2025-05-25T07:52:20 1748159540

It is clearly a composite file format [1]:

> Acorn’s native file format is used to losslessly store layer data, editable text, layer filters, an optional composite of the image, and various metadata. Its advantage over other common formats such as PNG or JPEG is that it preserves all this native information without flattening the layer data or vector graphics.

As I've mentioned, this is a good use case for SQLite as a container. But ZIP would work equally well here.

[1] https://flyingmeat.com/acorn/docs/technotes/ACTN002.html

sureglymop · 2025-05-25T07:18:01 1748157481

I think it's fine as an image format. I've used the mbtiles format which is basically just a table filled with map tiles. Sqlite makes it super easy to deal with it, e.g. to dump individual blobs and save them as image files.

It just may not always be the most performant option. For example, for map tiles there is alternatively the pmtiles binary format which is optimized for http range requests.

aidenn0 · 2025-05-25T21:31:29 1748208689

Except image formats and archival formats are composites (data+metadata). We have Exif for images, and you might be surprised by how much metadata the USTar format has.

lifthrasiir · 2025-05-26T00:37:22 1748219842

With that reasoning almost every format is a composite, which doesn't sound like a useful distinction. Such metadata should be fine as long as the metadata itself is isolated and can be updated without the parent format.

aidenn0 · 2025-05-26T03:27:19 1748230039

I agree that almost every format is a composite; you seem to not, which makes me think you mean something different than I by "composite."

Your reply suggests that, if all the metadata is auxiliary it can be segregated from the data and doesn't count as a composite.

However, that doesn't exclude archives (in many use-cases the file metadata is as important as the data itself; consider e.g. hardlinks in TAR files)

Nor does it exclude certain vital metadata for images: resolution, color-space, and bit-depth come to mind.

lifthrasiir · 2025-05-26T05:27:16 1748237236

My reasoning for Exif was that it is not only auxiliary but also post-hoc. Exif was defined independently from image formats and only got adopted later because those formats provided extension points (JPEG APP# markers, PNG chunks).

You've got a good point that there are multiple types of metadata and some metadata might be crucial for interpreting data. I would say such "structural" metadata should be considered as a part of data. I'm not saying it is not a metadata; it is a metadata inside some data, so doesn't count for our purpose of defining a composite.

I also don't think tar hardlinks are metadata for our purpose, because it technically consists of the linked path instead of the file contents and the information that the file is a hardlink, where the former is clearly a data and the latter is a metadata used to reconstruct the original file system so should be considered as a part of larger data (in this case, a logical notion of "file").

I believe these examples should be enough to derive my own definition of "composite". Please let me know otherwise.

aidenn0 · 2025-05-26T14:55:03 1748271303

I think I understand now. Thanks for the clarification.

frainfreeze · 2025-05-25T06:38:15 1748155095

sqlar proved a great solution in the past for me. Where does it fall short in your experience?

lifthrasiir · 2025-05-25T07:09:52 1748156992

Unless you are using the container file as a database too, sqlar is strictly inferior to ZIP in terms of pretty much everything [1]. I'm actually more interested in the context sqlar did prove useful for you.

[1] https://news.ycombinator.com/item?id=28670418

frainfreeze · 2025-05-25T21:11:39 1748207499

I remember seeing the comment you linked few years back, and back then comments were already locked so I couldn't reply, and this time I sadly don't have the time to get deeper into this, however - I recommend you to research more about sqlar/using sqlite db as _file format_ in general, or at minimum looking at SQLite Encryption Extension (SEE) (https://www.sqlite.org/see/doc/trunk/www/readme.wiki). You can get a lot out of the box with very little investment. IMHO sqlar is not competing with ZIP (can zip do metadata and transactions?)

lifthrasiir · 2025-05-26T00:53:40 1748220820

> [...] at minimum looking at SQLite Encryption Extension (SEE) (https://www.sqlite.org/see/doc/trunk/www/readme.wiki).

SEE is a proprietary extension, however generous its license is. So it is not very meaningful when sqlar is compared against ZIP. Not to say that I necessarily see encryption as a fundamental feature for compressed archive formats though---I'm advocating for age [1] integration instead.

[1] https://github.com/FiloSottile/age

> IMHO sqlar is not competing with ZIP (can zip do metadata and transactions?)

In my understanding SQLite's support for sqlar and ZIP occurred at the same time, so I believe that sqlar was created to demonstrate an alternative to ZIP (and that the demonstration wasn't good enough). I'm aware that this is just a circumstantial evidence, so let me know if you have some concrete one.

ZIP can of course do metadata in the form of per-file and archive comments. For more structured metadata, you can make use of extra fields if you really, really want, but at that point SQLite would indeed be a better choice. I however doubt it's a typical use case.

ZIP can be partially updated in place but can't do any transaction. But it should be noted that SQLite handles transaction by additional files (`-journal` or `-wal` files). So both sqlar and ZIP would write to an additional file during the update process, though SQLite would write much less data compared to ZIP. Any remaining differences are invisible to end users, unless the in-place update is common enough in which case the use of SQLite is justified.

frainfreeze · 2025-05-26T20:58:48 1748293128

Point is that SEE exists, and so do free alternatives.

> In my understanding SQLite's support for sqlar and ZIP occurred at the same time

I believe so too.

I agree with you on SQLAR being poor general-purpose archive or compression format compared to ZIP; what I'm arguing is that its very good file format for certain applications, offering structured, modifiable, and searchable file storage. We had great success using it as db/file format for PLM solution packed both as desktop and web app. Same database can then be used to power the web ui (single tenant SaaS deployments), and for desktop app (web export is simply a working file for desktop app). This file being just a simple sqlite db lets users play with data, do their own imports, migrations etc., while having all files & docs in one place.

leiserfg · on Feb 1, 2025

Félix Is like MacOS, it's good until it's not and then you can't extend it.

leiserfg · on Sept 21, 2024

I "hate" Makefiles, for building it's faster to use meson, zig build, cmake+ninja. For this use case, https://github.com/adriancooney/Taskfile is way more flexible and you don't need to mess with PHONY rules.

leiserfg · on April 2, 2023

It's not April 1st any more, stop doing this please!

leiserfg · on Jan 11, 2023

It's weird to me that no-one doing this kind of project compares it with nginx-unit.

cirospaciari · on Jan 11, 2023

Here uvicorn+gunicorn+httptools outperforms nginx-unit with FastAPI and granian is pretty close to uvicorn now

https://www.techempower.com/benchmarks/#section=test&runid=d...

leiserfg · on Nov 5, 2022

Firefox user here, changed my user-agent to the chrome one and the app works fine (just a bit slow).

leiserfg · on Jan 20, 2022

I managed to be so annoying that I got kovid and be5invis to make changes in iosevka and kitty until they worked fine together (before ligatures didn't work). User of both for several years.

Hackbraten · on Jan 20, 2022

Thank you for that. Much appreciated.

Tau_Cygna_V · on Jan 21, 2022

lmao legend, thanks a lot

leiserfg · on Nov 10, 2021

Is not bmp the default file-format of paint nowadays?

fredoralive · on Nov 10, 2021

It uses PNG as default. Can also do BMP, JPEG, GIF, HEIC and TIFF on Windows 10.

leiserfg · on June 17, 2021

This can't pass unperceived. Since last release of Kitty and Iosevka, one can get ligatures working correctly.