Apt 2.0

Aloha · on March 8, 2020

This item: "The apt(8) command no longer accepts regular expressions or wildcards as package arguments, use patterns (see New Features)."

Seems like a pretty big breaking change.

kwk1 · on March 8, 2020

To be fair this warning has always been present:

`WARNING: apt does not have a stable CLI interface. Use with caution in scripts.`

systemvoltage · on March 8, 2020

Also feels like a regression. What's wrong with keeping that option? Regex is universal way of matching strings.

tssva · on March 8, 2020

Apt patterns allows matching on criteria besides strings such as searching for all packages with broken dependencies. It also allows combining different search criteria in various manners which would be difficult or impossible using just a regex search. If you want to just match packages using a regex against package names that is still possible using the name apt pattern.

I would suggest reading the apt-pattern man page referenced before passing judgement.

systemvoltage · on March 8, 2020

> difficult or impossible using just a regex search

I am sorry but this isn't true. I can prove it.

Regex is used to parse the program itself that runs APT, this is called Lexical analysis[1]. GNU Bison and Flex [2] are lexical analyzers for example that compile a lot of code we write today. So whatever pattern matching code was written in C as part of the APT 2.0 package, that code itself is read and compiled by regex. So, it provably cannot be impossible to do a match using this new feature you're referring to that cannot be matched using plain old regex. Difficulty is subjective depending on who the target demographic is.

All these things are actually moot - it doesn't cost you anything to keep Regex option to search for some package names, perhaps a bit of additional code to maintain. Also, other means of pattern matching can be added in addition to regex search feature. Why not both?

[1] https://en.wikipedia.org/wiki/Lexical_analysis [2] https://aquamentus.com/flex_bison.html

saagarjha · on March 8, 2020

You're missing the point and being patronizing while doing it. Regex search can only match package names. APT's flags lets it match on other attributes as well, which a simple regex cannot interact with.

geofft · on March 8, 2020

If you try to install libc++-dev and that package doesn't exist for whatever reason, you probably want an error message, not a regex match to libc-dev. Interpreting the argument as a regex violates the principle of least astonishment.

CJefferson · on March 8, 2020

I always get confused how trying to install clang++ spews out a hundred messages about how it matches a huge number of packages. Fortunately they conflict, so it doesn't then try to actually install them.

julian-klode · on March 8, 2020

Wow, that's a good example I did not have on my mind, but it's entirely the reason why I made that change.

julian-klode · on March 8, 2020

Regular expressions are completely unsafe. See the clang++ example, which matches every package containing clang+ (that is 'clan' and one or more 'g'), unless of course somebody uploaded a clang++ or a clang+ package....

glob style wildcards are more common, though, in that context, anyway. It would be safe to re-enable those, but I'm not certain this provides a sensible user experience improvement.

Avamander · on March 8, 2020

I wonder when will apt will start supporting specifying packages per repository. It's not safe how any package could come from any repository by-default, unless annoying effort is spent manually pinning packages.

JoshTriplett · on March 8, 2020

If you install a package, that package can do arbitrarily bad things to your system; a repository you don't trust is never "safe".

BiteCode_dev · on March 8, 2020

Yes, but they may add other packages that break my machine by mistake later. If I know I want only one thing from them, I limit their capacity for later damages.

Avamander · on March 8, 2020

When talking about safety I rather meant in terms of stability. Some repos publish a lot of newer packages when I would only want one.

asdfasgasdgasdg · on March 8, 2020

I can think of at least one vector. Suppose a particular library announces one generic repository as their approved distribution channel. Suppose the repo does not do any vetting on what gets uploaded. In this circumstance, you'd only want to pull that particular package from the repo, not any arbitrary package.

So I think the trust model bears scoping. It's not all or nothing.

geofft · on March 8, 2020

Is there any apt repo software that permits multiple parties to upload to a repo but also enforces that people can't upload new versions of other people's packages?

As it happens, Debian itself kinda does this (Debian "maintainers" can only upload to specific package names), but I don't think anyone else runs that software. The usual third-party repo tools like reprepro and aptly don't support it, as far as I know. And sites like Launchpad or OBS just set up a separate apt repo per account (or even multiple apt repos per account), because doing that is very easy.

In other words - yes, the trust model you propose is coherent, but I don't think anyone actually does that, because there's a more straightforward option already.

andrewshadura · on March 8, 2020

Actually, both reprepro and aptly support this.

roenxi · on March 8, 2020

The apt trust model is overly simplistic but it is so hard to come up with a safe model that I don't see how it matters. And if a repository added for a specific package might be compromised then there might be a malicious version of that package already in the repo. Once you have to verify the versions and hashes by hand you may as well download the debs directly & use dpkg.

c3534l · on March 8, 2020

I don't trust any repository, but I still need to install packages to get things done.

julian-klode · on March 8, 2020

What I do like is that, I think dnf, they do not switch your packages between repositories, even if you added a third-party repo with a higher version number.

But this is hard to do correctly for deb repositories. We first need to come up with a way to declare repository groups so you can say that e.g. the security repo can provide updates to the main repo.

akvadrako · on March 8, 2020

Apt does support this with pinning. You can say a repo has negative priority by default and a normal priority for specific packages.

julian-klode · on March 8, 2020

It's ... tough

sneak · on March 8, 2020

Does it still redownload 30MB and spin for 10s when you (or an installer script) do “apt update” a few seconds after the last invocation of same instead of comparing a root hash or some other sane method?

throwaway8941 · on March 8, 2020

Check your repository and/or apt settings.

apt has had support for pdiff for what, two decades now?

https://debian-administration.org/article/439/Avoiding_slow_...

I am no fan of it though, dnf is just so much better in all respects.

sneak · on March 8, 2020

I’m using archive.ubuntu.com and whatever the default apt settings in Ubuntu LTS are. I feel like a ton of debian-style packaging is simply “this is the way we’ve always done it”. debmirror, for example, is hot garbage.

pas · on March 8, 2020

Debian was never about ergonomics :(

I remember checking their packaging howto every few years to maybe really/properly/truly understand it, but just the sheer uselessness of the text signaled each time that it's just not worth it. Just google whatever you want copy from stackoverflow, and be done with it, don't try to understand it. (Eg. if I wanted a virtual package that provides some package so I can fake that so it won't get pulled in as useless dependency for other packages.)

sneak · on March 8, 2020

I would absolutely love to maintain a metapackage and some tools on a personal apt mirror that I can add to machines and update periodically. The burden/overhead of learning the ridiculously tradition-based and overcomplicated system in use has kept me from doing it for years.

julian-klode · on March 8, 2020

It literally is so simple you could do it by hand.

pas · on March 8, 2020

It's simple to brute force a .deb (after all it's just an ar containing two tar files, on contains the control files, and that simple), but the process, the myriad of debhelpers, obscure traditions, mandatory steps (changelog update), and whatnot are not simple at all.

NewJazz · on March 8, 2020

I would also point out that actually getting software into Debian is no fun for newcomers either. The whole mentorship process (where you use some arcane command to upload a package you built yourself to a special website, then ask someone to look at it) is wacky. The project could really improve their onboarding process, or at least make it easier for fly by contributions.

https://mentors.debian.net/

sneak · on March 9, 2020

I run a bionic and focal full mirror, and even just keeping a mirror working right with debmirror is a huge complicated mess. I don’t understand why there isn’t an overhaul of the packaging tools and mirror structure to make it a lot simpler.

pas · on March 8, 2020

Bionic 18.04 LTS. Apt 1.6.12.

~# time apt update

real 0m5.059s

real 0m4.981s

Okay, adding Focal 20.04 LTS repo. apt install -y apt, got apt 1.9.10.

real 0m7.222s

real 0m6.924s

It fetches the InRelease files, all of them seem to have an ETag header, but probably Apt doesn't feel that it can rely on those for some reason.

But using `apt -o Debug::Acquire::http=on update` we see that it depends on the "If-Modified-Since" request headers. And many of the repos just return 304.

There doesn't seem to be a config setting for increasing the number of outgoing connections either.

So even if it gets just 304 responses it still does the "Reading package lists... Building dependency tree" part, which is slooow.

julian-klode · on March 8, 2020

APT does in fact check if the root of the repo changed and if so does not download anything else and will then notice that the input files have not changed when it's checking if the cache needs rebuilding.

What you're seeing here is some hook that's being run and updating its cache.

sneak · on March 8, 2020

I don’t believe that this is the case, based on apt’s output.

julian-klode · on March 8, 2020

What output? It downloads a file, then it sits around printing nothing for a few seconds (hooks are running), and then it updates it cache if necessary. Well, maybe it always rebuilds the cache, I'm not sure.

techntoke · on March 8, 2020

After working with Debian/Ubuntu packaging, and switching to Arch and Pacman, it is so much easier to create/maintain packages on Arch and Alpine distros than the alternatives.

taeric · on March 8, 2020

My age has given me too much cynicism. Green field is always easier. Usually does a ton less, though.

In some cases, shedding requirements is good. In most, it is just a race to feature parity. :(

linsomniac · on March 8, 2020

Perhaps, but as a counter example I'd compare it to RPM.

RPM is a solid, mature, heavily used package format. And the .spec file I've always found to be so much easier to work with than Debian packaging. I've made many debs, and my current work uses debs for deploying all our software, but I still find it so much harder to make them. I have a workflow for my current packages, but if I want to take a new piece of software and turn it into a .deb, it is always pretty painful. I usually have to ask a debian packager for help. They are always super nice and helpful, but I just wish I could package things myself.

Seems like if you package .debs all the time, it becomes easier. But as an infrequent packager, I find RPMs .spec to be much easier to manage than Debian's system.

NewJazz · on March 8, 2020

I think spec files are asinine. Throwing it all in one specially formatted file and using weird macros to define the build steps. Deb is terrible too, but mostly just because they split everything up too much. The actual file formats are nice to work with. E.g. the rules file is just a make script and the install files are lists of files to install and their destinations.

Ports systems like alpine/arch/void are nice, but they also handle less (except for void maybe, that ports system has built in support for a lot of different build systems, so packaging a cmake or meson project is a breeze).

kelnos · on March 8, 2020

I agree; I do .deb packaging just infrequently enough that every time I go to do it, I have to re-learn it nearly from scratch. And it's not simple at all.

yjftsjthsd-h · on March 8, 2020

I'm fond of FPM for this kind of thing. It provides a usable interface over a bunch of package formats and explicitly aims to make the whole thing painless, and in my experience it succeeds.

Conan_Kudo · on March 9, 2020

I prefer RPM packaging over all major styles of dsc (Debian Source Control) packaging as well, but...

It's actually possible to use RPM spec files to build Debian packages: https://github.com/debbuild/debbuild

And I've been going through and slowly porting much of the RPM ecosystem common macros to run on debbuild so that spec files can be reused to comply with Debian packaging policies as much as possible: https://github.com/debbuild/debbuild-macros

(Disclosure: I'm the current maintainer of debbuild)

taeric · on March 8, 2020

I suspect it is easier, per package. Is it easier for all of them, though? An odd spot where I suspect it is not a simple summation of effort.

LessDmesg · on March 8, 2020

After using Ubuntu then Debian, and briefly trying Arch, it is so much easier to manage packages on Gentoo than the alternatives. USE flags, package.mask and the slots system are indispensable. And you get freedom from systemd.

CoffeeDregs · on March 8, 2020

Argh. Was just upgrading a Debian machine and was wishing (again) for parallel downloads. apt-fast is a severely limited hack (I couldn't figure out what it actually supports). I assumed any significant apt update would include performance/DL improvement. No?

julian-klode · on March 8, 2020

Non-parallel downloads from the same server are by design - turning that restriction off would be an easy thing. Server resources are limited, and people should not be cheating their way around bandwidth limits the server has. Put multiple mirrors in a mirrors.list, and then use `mirror+file://path to mirror list` instead of the http source in sources.list so that apt downloads from all these mirrors in parallel.

Be aware that high parallelisation of downloads may reduce your throughput for small number of things to fetch.

There also should be no latency need - APT keeps the number of requests it has sent to the server at 10, so there should be no latency overhead. Yes I know, this does not work for Google because their latency is crazy high, but their speed is super high too.

nrclark · on March 8, 2020

Cool!

Does anybody know if there are plans to fix the .deb size limit? It's a bummer that .debs can't be more than 10 GB. That number seems big, but I've hit it before when packaging custom toolchains for internal use at my company.

teruakohatu · on March 8, 2020

Can't you just create a dependancy and split it up?

nrclark · on March 8, 2020

Definitely possible in theory. Kind of a pain in practice though.

I was working on trying to package Xilinx Vivado for internal use on my company's build servers. It's around a 20-25GB installation, mostly of smallish files. I could have definitely manually split the package up. It's the kind of thing that's hard to do algorithmically though. After a day or so of trying to solve the packaging problem, I eventually gave up on the idea altogether.

Large software installs are still fairly uncommon on Linux, but I see more and more of them in the wild. Especially when I look at some games and stuff - I have a bunch of software that's larger than 10 GB. It's kind of nuts that the .deb package format is still so constrained, especially considering its importance to so many distros.

m0zg · on March 8, 2020

Missed opportunity to switch to zstd IMO. Much faster than gzip, especially during decompression, and essentially the same compression ratio. Them apt upgrades take too long.

anttisalmela · on March 8, 2020

apt has supported zstd since 1.6 already.

m0zg · on March 8, 2020

TIL. Ubuntu needs to get it act together then, it sounds like. I think they're using lzma, which is basically the slowest format available.

julian-klode · on March 8, 2020

Last evaluation has shown that the size increase of zstd compared to xz warrants an introduction of delta debs first, so that people don't have to download more.

But then with deltas, mirror sizes grows even more, which is a hard sell.

Arch Linux gets away with using zstd -21 but that's not practical for general purpose Linux distributions, as it has significant memory requirements compared to xz.

Also the speedup in practice is negligent - while zstd is much faster than xz, most of the time installing packages is actually spent in fsync(). So you only see huge speedups when run in eatmydata.

julian-klode · on March 8, 2020

Like, zstd -21 --ultra requires a whopping 365MB of RAM to compress a 16 MiB file, zstd -19 requires 12 MB, and xz requires 10 MB.

Decompressing -21 requires 20MB vs 10MB for -19 and xz.

m0zg · on March 9, 2020

zstd is _way_ faster at decompressing though, even from --ultra: some 13x faster in the case of Arch Linux.

https://en.wikipedia.org/wiki/Zstandard

julian-klode · on March 10, 2020

I know that, but compression time is not as relevant as you think it is. If you look back at the measurements we did 2 years ago:

https://lists.ubuntu.com/archives/ubuntu-devel/2018-March/04...

You can see that the performance gain from zstd under realistic scenarios is about 10%.

you can see that switching from xz to zstd only improved firefox install time from 37s to 33s; running in eatmydata to avoid fsync from dpkg improved it to 12.5s (8.5s with zstd).

I don't know what Arch linux does, but they likely do not correctly fsync() data after its been written to a temporary file, and then fsync the directory after files have been renamed to their final names, as is required to achieve consistent results in a crash.

andrewshadura · on March 8, 2020

Ubuntu were the first who started using zstd, only a bit later it was ported to Debian.

_hudj · on March 8, 2020

I know they are kind of just wrapping aptitude but it's frustrating I can't just wildcard for things. It's a package manager not a development framework, why would regex and wildcard not be enough for anyone?

julian-klode · on March 8, 2020

We could reinstate wildcards (but not regexes, they are unsafe as their magic characters overlap with valid package names, so g++ can mean the package g++, the package g+, or any package matching the regex g+).

I feel like once people got used to patterns, that will be sufficient though, and the error reporting gets nicer.