More

garblegarble · 2026-02-02T10:08:34 1770026914

>does this limit the agent's ability to run standard Linux tooling? Or are you relying on the AI to just figure out the BSD/macOS equivalents of standard commands?

Slightly counterintuitively, Apple Containers spawns linux VMs.

There doesn't appear to be any way to spawn a native macOS container... which is a pity, it'd be nice to have ultra-low-overhead containers on macOS (but I suspect all the interesting macOS stuff relies on a bunch of services/gui access that'd make it not-lightweight anyway)

FYI: it's easy enough to install GNU tools with homebrew; technically there's a risk of problems if applications spawn commandline tools and expect the BSD args/output but I've not run into any issues in the several years I've been doing it).

garblegarble · 2026-01-23T23:32:57 1769211177

For my inputs, whisper distil-large-v3.5 is the best. I tried Parakeet 0.6 v3 last night but it has higher error rates than I'd like (but it is fast...)

Johnny_Bonk · 2026-01-23T23:35:16 1769211316

Nice I'll try it, as of now for my personal stt workflow I use eleven labs api which is pretty generous but curious to play around with other options

garblegarble · 2026-01-23T23:46:55 1769212015

I assume that will be better than whisper - I haven't benchmarked it against cloud models, the project I'm working on cannot send data out to cloud models

BiraIgnacio · 2026-01-23T23:52:38 1769212358

oh I've been looking into whisper and vosk in the last few days. I'll probably go with whisper (with whisper.cpp) but has anyone compared it to vosk models?

garblegarble · 2026-01-20T18:01:12 1768932072

>In June 2025, 56% of people in Great Britain thought it was the wrong decision

It's not so clear when you consider that 48.1% of the original referendum voters wanted to stay in the EU. I'm honestly very surprised by this poll, 8% change is pretty minimal considering the turmoil the country has gone through since 2016.

How much of this can be explained by older voters dying in the intervening 10 years, I recall that demographic skewed much more heavily Leave in 2016

lostlogin · 2026-01-20T19:02:32 1768935752

Half the issue is the definition of ‘voter’. Turn-out is abysmal and polling has been crap in major ways. Calling someone eligible to vote a ‘voter’ is probably only right 50-60% of the time.

https://commonslibrary.parliament.uk/general-election-2024-t...

garblegarble · 2026-01-06T23:32:24 1767742344

>And what do you even mean by "prepare"?

Not the person you're responding to but... if you think it's a horse -> car change (and, to stretch the metaphor, if you think you're in the business of building stables) then preparation means train in another profession.

If you think it's a hand tools -> power tools change, learn how to use the new tools so you don't get left behind.

My opinion is it's a hand -> power tools change, and that LLMs give me the power to solve more problems for clients, and do it faster and more predictably than a client trying to achieve the same with an LLM. I hope I'm right :-)

simonw · 2026-01-06T23:37:08 1767742628

That's a good analogy. I'm on team hand tools to power tools too.

SoftTalker · 2026-01-07T01:56:41 1767751001

Why do you suppose that these tools will conveniently stop improving at some point that increases your productivity but are still too much for your clients to use for themselves?

simonw · 2026-01-07T06:26:59 1767767219

Because I've seen how difficult it is to get a client to explain to me what they need their software to do.

SoftTalker · 2026-01-07T18:15:33 1767809733

And so the AI will develop the skills to interview the client and determine what they really need. There are textbooks written on how to do this, it's not going to be hard to incorporate into the training.

garblegarble · 2026-01-06T23:20:39 1767741639

If they're using Opus then it'll be the $100/month Claude Max 5x plan (could be the more expensive 20x plan depending on how intensive their use is). It does consume a lot of tokens, but I've been using the $100/mo plan and get a lot done without hitting limits. It helps to be mindful of context (regularly amending/pruning your CLAUDE.md instructions, clearing context between tasks, sizing your tasks to stay within the Opus context window). Claude Code plans have token limits that work in 5-hour blocks (that start when you send your first token, so it's often useful to prime it as early in the morning as possible).

Claude Code will spawn sub-agents (that often use their cheap Haiki model) for exploration and planning tasks, with only the results imported into the main context.

I've found the best results from a more interactive collaboration with Claude Code. As long as you describe the problem clearly, it does a good job on small/moderate tasks. I generally set two instances of Claude Code separate tasks and run them concurrently (the interaction with Claude Code distracts me too much to do my own independent coding simultaneously like with setting a task for a colleague, but I do work on architecture / planning tasks)

The one manner of taste that I have had to compromise on is the sheer amount of code - it likes to write a lot of code. I have a better experience if I sweat the low-level code less, and just periodically have it clean up areas where I think it's written too much / too repetitive code.

As you give it more freedom it's more prone to failure (and can often get itself stuck in a fruitless spiral) - however as you use it more you get a sense of what it can do independently and what's likely to choke on. A codebase with good human-designed unit & playwright tests is very good.

Crucially, you get the best results where your tasks are complex but on the menial side of the spectrum - it can pay attention to a lot of details, but on the whole don't expect it to do great on senior-level tasks.

To give you an idea, in a little over a month "npx ccusage" shows that via my Claude Code 5x sub I've used 5M input tokens, 1.5M output, 121M Cache Create, 1.7B Cache Read. Estimated pay-as-you-go API cost equivalent is $1500 (N.B. for the tail end of December they doubled everybody's API limits, so I was using a lot more tokens on more experimental on-the-fly tool construction work)

NiloCK · 2026-01-06T23:38:53 1767742733

FYI Opus is available and pretty usable in claude-code on the $20/Mo plan if you are at all judicious.

I exclusively use opus for architecture / speccing, and then mostly Sonnet and occasionally Haiku to write the code. If my usage has been light and the code isn't too straightforward, I'll have Opus write code as well.

covibes · 2026-01-10T12:52:30 1768049550

The problem with current approaches is the lack of feedback loops with independent validators that never lose track of the acceptance criteria. That's the next level that will truly allow no-babysitting implementatons that are feature complete and production grade. Check out this repo that offers that: https://github.com/covibes/zeroshot/

garblegarble · 2026-01-06T23:42:02 1767742922

That's helpful to know, thanks! I gave Max 5x a go and didn't look back. My suspicion is that Opus 4.5 is subsidised, so good to know there's flexibility if prices go up.

baq · 2026-01-07T07:08:53 1767769733

The $20 plan for CC is good enough for 10-20 minutes of opus every 5h and you’ll be out of your weekly limit after 4-5 days if you sleep during the night. I wouldn’t be surprised if Anthropic actually makes a profit here. (Yeah probably not, but they aren’t burning cash.)

garblegarble · 2026-01-04T23:24:54 1767569094

>the best way to install these tools is to build it yourself, i.e. make install, etc.

And you're fully auditing the source code before you run make, right? I don't know anyone who does, but you're handing over just as much control as with curl|bash from the developer's site, or brew install, you're just adding more steps...

colesantiago · 2026-01-05T00:03:13 1767571393

> And you're fully auditing the source code before you run make.

I mean you can?

But that is the whole point when the source is available, it is easier to audit, rather than binaries.

Even with brew, the brew maintainers have already audited the code, and it the source to install and even install using --HEAD is hosted on brew's CDN.

garblegarble · 2026-01-06T23:39:58 1767742798

>Even with brew, the brew maintainers have already audited the code

Realistically, how much are they auditing? I absolutely agree with your sentiment that it's better than a binary, but I think the whole security model we have is far too trusting because of the historically overwhelming number of good-faith actors in our area both in industry and hobbyists

garblegarble · 2026-01-03T18:11:46 1767463906

> If both are present but different the unprefixed version should be favoured. That seems uncontroversial & not complex to implement.

oops, you just enabled smuggling where there's a mismatch between what a proxy/firewall/etc supports and what an internal service supports.

    X-Do-Evil: true
    Do-Evil: false

lucideer · 2026-01-03T18:40:48 1767465648

Smuggling is a general concern whenever two headers have functionality that interact - it's not specific to prefix masking & given how implementation-based it is, it's not even likely to occur to any arbitrary prefix mask.

That's not a reason not to consider it a threat vector when implementing, but no more than when implementing any header (that interacts with another)

MrJohz · 2026-01-03T21:20:29 1767475229

But isn't the problem with X- headers that if they ever get standardised, they necessarily create this smuggling issue? Whereas if you start with an unprefixed header and standardise it under the same name, you avoid this issue.

You could also solve the problem by standardising the header with the X- prefix, but this is more confusing to users and violates the idea that X- always means "not standardised", at which point the prefix is useless anyway.

Bratmon · 2026-01-04T01:20:35 1767489635

> That's not a reason not to consider it a threat vector when implementing, but no more than when implementing any header (that interacts with another)

But the header wouldn't have interacted with another header if we hadn't decided to do this X-prefix nonsense!

lucideer · 2026-01-04T10:03:21 1767521001

It might not have but it's a lot more likely that it would.

garblegarble · 2025-10-24T00:42:41 1761266561

"Treating the symptoms not the cause" would be the english equivalent.

(for others: the Dutch expression is "Dweilen met de kraan open", "Mopping with the tap open")

garblegarble · 2025-10-21T15:25:31 1761060331

I went looking for a source on this, looks to be: https://www.wsj.com/business/retail/walmart-employee-treatme... / https://archive.is/fPAlB

garblegarble · 2025-10-20T21:09:42 1760994582

China currently can't make the high-performance, efficient, long-life jet engines that US & Europe make. The commercial market is heavily cost-sensitive, so they can't compete there currently as a result.

This doesn't matter so much for military purposes: they can easily eat the cost of a higher maintenance and replacement schedule on a smaller number of military jets with fewer hours on them.

This gives them more iteration cycles, speeding their building up of experience. They're catching up. Industrial espionage will help them along too, but not as much as the experience from engineering their own designs.