This is where the desire to NOT anthropomorphize LLMs actually gets in the way.
We have mechanisms for ensuring output from humans, and those are nothing like ensuring the output from a compiler. We have checks on people, we have whole industries of people whose whole careers are managing people, to manage other people, to manage other people.
with regards to predictability LLMs essentially behave like people in this manner. The same kind of checks that we use for people are needed for them, not the same kind of checks we use for software.
> The same kind of checks that we use for people are needed for them
Those checks works for people because humans and most living beings respond well to rewards/punishment mechanisms. It’s the whole basis of society.
> not the same kind of checks we use for software.
We do have systems that are non deterministic (computer vision, various forecasting models…). We judge those by their accuracy and the likely of having false positive or false negatives (when it’s a classifier). Why not use those metrics?
I'm looking at it right now as a tool I can hollow out and stuff in my own MCP server that also has personas, skills, an agentic loop, memory, all those pieces. I may even go simpler than that and simply take a look at it's gateway and channels and drag those over and slap them onto the MCP server I have and turn it into an independent application.
It looks far too risky to use, even if I have it sequestered in its own VM. I'm not comfortable with its present state.
Where I think agents become fascinating is when we give cc an interface to something like clawdebot, plus any logging/observability, and tell it to recreate the code base.
Had humans not been doing this already, I would have walked into Samsung with the demo application that was working an hour before my meeting, rather than the android app that could only show me the opening logo.
There are a lot of really bad human developers out there, too.
An embedded page at landr-atlas.com says:
Attention!
MacOS Security Center has identified that your system is under threat.
Please scan your MacOS as soon as possible to avoid more damage.
Don't leave this page until you have undertaken all the suggested steps
by authorised Antivirus.
[OK]
Thank you for the note. It's not a site I used all that often.
Whether you had anything to do with it or not, I have no idea. And, since you didn't follow best practices and tell me directly rather than trying to score points here, there's really no way of knowing whether you're the one who caused the problem in the first place.
I built a new site without Wordpress. That took in less than a day.
I don't imagine you will alter your behavior to align with general best security practices anytime soon.
> Whether you had anything to do with it or not, I have no idea. And, since you didn't follow best practices and tell me directly rather than trying to score points here, there's really no way of knowing whether you're the one who caused the problem in the first place.
Are you actually accusing me (slyly couched in weasel words, but still explicitly) of hacking your wordpress blog, then pointing it out on Hacker News to score points?
Yeah, you have a point /s: there's really no way to tell if I hacked your blog or not, nor any way of knowing whether any statement is true or not if you're nihilistic enough, but you're going to have to take my word that I didn't, and clean up your own mess without shifting the blame to me, or demanding I should have helped you. You're the one who chose to use wordpress, not me. FYI, "general best security practices" include DON'T USE WORDPRESS.
What possible evidence or delusional reasons do you have to imply that I hacked your wordpress blog? Is your security really that lax and password that easy to guess? And even if I did, then why would I post about it publicly or notify you privately? You sound pathologically paranoid and antisocially aggressive to make such baseless accusations out of the blue, to try to shift the blame to me for your own mistakes. That makes me glad I didn't try to contact you directly. Funny thing for you to complain about when you don't even openly publish your contact email address on your blog or hn profile like I do, though.
I think Claude Cowork should come with a requirement or a very heavily structured wizard process to ensure the machine has something like a Time Machine backup or other backups that are done regularly, before it is used by folks.
The failure modes are just too rough for most people to think about until it's too late.
It certainly doesn't seem to have a trouble creating MIT licenses, that's for sure. I've had it insert an MIT license against my express direction instead of the AGPL license.
That's a fascinating path forward and most frustrating of all, if I understand this right, this was something that could have been discovered 20 years ago.
Yep. If someone had spent the money to do the research, we probably could have. Of course you can’t really rely on a government that gets replaced every few years to think long–term.
I can tell you that a good number of the design drawings for the higher floors in the Venetian resort in Las Vegas were assembled with AutoLisp scripts. The scripts I created grabbed components from other drawings that were already made to assemble a first pass set of drawings for floors that hadn't been fully designed yet, since the floors all had components of other floors.
They were still in the design process for the upper floors, while the lower floors had already been finished and they were moving up the building.
I used to work in the industry. I know the guys responsible for real-time data capture from various platforms like Roku and Visio.
I 100% agree, and I own very nice LG TVs. They are not connected to the internet. They each have an Apple TV and that is their only way that they get video, and can't send data out.
We have mechanisms for ensuring output from humans, and those are nothing like ensuring the output from a compiler. We have checks on people, we have whole industries of people whose whole careers are managing people, to manage other people, to manage other people.
with regards to predictability LLMs essentially behave like people in this manner. The same kind of checks that we use for people are needed for them, not the same kind of checks we use for software.
reply