More

esafak · 2025-12-12T05:32:21 1765517541

I too am curious how to the first commit came about: https://github.com/stoolap/stoolap/commit/768eb836de0ff072b8...

Note to owner: CI is broken.

esafak · 2025-12-11T21:25:37 1765488337

If only they had a good designer. They have the ugliest frames.

fidotron · 2025-12-11T21:28:45 1765488525

They also like to indulge in massive post purchase UX changes so if you like it now there is a good chance you won't in a year.

esafak · 2025-12-11T19:53:13 1765482793

Is that intoned like Ricardo Montalbán's "Khaaaan!" ?

donalhunt · 2025-12-12T01:05:01 1765501501

I think more "Fenton!"...

https://youtu.be/3GRSbr0EYYU

esafak · 2025-12-11T19:50:11 1765482611

I would not be so sure. You can always prep to the test.

HDThoreaun · 2025-12-11T20:18:33 1765484313

How do you prep for arc agi? If the answer is just "get really good at pattern recognition" I do not see that as a negative at all.

ben_w · 2025-12-11T22:25:17 1765491917

It can be not-negative without being sufficient.

Imagine that pattern recognition is 10% of the problem, and we just don't know what the other 90% is yet.

Streetlight effect for "what is intelligence" leads to all the things that LLMs are now demonstrably good at… and yet, the LLMs are somehow missing a lot of stuff and we have to keep inventing new street lights to search underneath: https://en.wikipedia.org/wiki/Streetlight_effect

HDThoreaun · 2025-12-11T23:10:15 1765494615

I dont think many people are saying 100% arc-agi 2 is equivalent to AGI(names are dumb as usual). Its just the best metric I have found, not the final answer. Spatial reasoning is an important part of intelligence even if it doesnt encompass all of it.

esafak · 2025-12-10T20:06:18 1765397178

They do. Where did you get this? All the providers have clauses like this:

"4.1. Generally. Customer and Customer’s End Users may provide Input and receive Output. As between Customer and OpenAI, to the extent permitted by applicable law, Customer: (a) retains all ownership rights in Input; and (b) owns all Output. OpenAI hereby assigns to Customer all OpenAI’s right, title, and interest, if any, in and to Output."

https://openai.com/policies/services-agreement/

shakna · 2025-12-10T20:24:34 1765398274

The outputs of AI are most likely in the public domain. As automated process output are public domain, and the companies claim fair use when scraping, making the input unencumbered, too.

It wouldn't be OpenAI holding copyright - it would be no one holding copyright.

bcrosby95 · 2025-12-10T21:11:54 1765401114

Courts have already leaned this way too, but who knows what'll happen when companies with large legal funds enter the arena.

macrolime · 2025-12-10T21:43:41 1765403021

So you're saying machine code is public domain if it's compiled from C? If not, why would AI generated code be any different?

fhd2 · 2025-12-10T22:51:58 1765407118

That would be considered a derivative work of the C code, therefore copyright protected, I believe.

Can you replay all of your prompts exactly the way you wrote them and get the same behaviour out of the LLM generated code? In that case, the situation might be similar. If you're prodding an LLM to give you a variety of resu

But significantly editing LLM generated code _should_ make it your copyright again, I believe. Hard to say when this hasn't really been tested in the courts yet, to my knowledge.

The most interesting question, to me, is who cares? If we reach a point where highly valuable software is largely vibe coded, what do I get out of a lack of copyright protection? I could likely write down the behaviour of the system and generate a fairly similar one. And how would I even be able to tell, without insider knowledge, what percentage of a code base is generated?

There are some interesting abuses of copyright law that would become more vulnerable. I was once involved in a case where the court decided that hiding a website's "disable your ad blocker or leave" popup was actually a case of "circumventing effective copyright protection". In this day and age, they might have had to produce proof that it was, indeed, copyright protected.

macrolime · 2025-12-10T23:14:30 1765408470

"Can you replay all of your prompts exactly the way you wrote them and get the same behaviour out of the LLM generated code? In that case, the situation might be similar. If that's not the case, probably not." Yes and no. It's possible in theory, but in practice it requires control over the seed, which you typically don't have in the AI coding tools. At least if you're using local models, you can control the seed and have it be deterministic.

That said, you don't necessarily always have 100% deterministic build when compiling code either.

fhd2 · 2025-12-11T08:51:55 1765443115

That would be interesting. I don't believe getting 100% the same bytes every time a derivative work is created in the same way is legally relevant. Take filters applied to copyright protected photos - might not be the exact same bytes every time you run it, but it looks the same, it's clearly a derivative work.

So in my understanding (not as a lawyer, but someone who's had to deal with legal issues around software a lot), if you _save_ all the inputs that will lead to the LLM creating pretty much the same system with the same behaviour, you could probably argue that it's a derivative work of your input (which is creative work done by a human), and therefore copyright protected.

If you don't keep your input, it's harder to argue because you can't prove your authorship.

It probably comes down to the details. Is your prompt "make me some kind of blog", that's probably too trivial and unspecific to benefit from copyright protection. If you specify requirements to the degree where they resemble code in natural language (minus boilerplate), different story, I think.

(I meant to include more concrete logic in my post above, but it appears I'm not too good with the edit function, I garbled it :P)

shakna · 2025-12-11T01:31:53 1765416713

Derivatives inherit.

Public domain in, public domain out.

Copyright'd in, copyright out. Your compiled code is subject to your copyright.

You need "significant" changes to PD to make it yours again. Because LLMs are predicated on massive public data use, they require the output to PD. Otherwise you'd be violating the copyright of the learning data - hundreds of thousands of individuals.

tapoxi · 2025-12-10T22:38:35 1765406315

Monkey Selfie case, setting the stage for an automated process is not enough to declare copyright over a work.

immibis · 2025-12-11T15:33:58 1765467238

No, and your comment is ridiculously bad faith. Courts ruled that outputs of LLMs are not copyrightable. They did not rule that outputs of compilers are not copyrightable.

ranger_danger · 2025-12-11T21:34:50 1765488890

I think that lawsuit was BS because it went on the assumption that the LLM was acting 100% autonomously with zero human input, which is not how the vast majority of them work. Same for compilers... a human has to give it instructions on what to generate, and I think that should be considered a derivative work that is copyrightable.

shakna · 2025-12-11T22:30:25 1765492225

If that is the case - then it becomes likely that LLMs are violating the implicit copyright of their sources.

If the prompt makes the output a derivative, then the rest is also derivative.

immibis · 2025-12-12T09:54:12 1765533252

The sensible options were that either LLM outputs are derivative of all their training data, or they're new works produced by the machine, which is not a human, and therefore not copyrightable.

Courts have decided they're new works which are not copyrightable.

ranger_danger · 2025-12-11T22:40:16 1765492816

I would say all art is derivative, basically a sum of our influences, whether human or machine. And it's complicated, but derivative works can be copyrighted, at least in part, without inherently violating any laws related to the original work, depending on how much has changed/how obvious it is, and depending on each individual judge's subjective opinion.

https://www.legalzoom.com/articles/what-are-derivative-works...

shakna · 2025-12-11T23:49:46 1765496986

If all art is derivative, then the argument also applies to the LLM output.

If the input has copyright, so does the output.

If the input does not, then neither does the output.

A prompt is not enough to somehow claim artistry, because the weights have a greater influence. You cannot separate the sum of the parts.

robocat · 2025-12-11T01:34:32 1765416872

What about patents - if you didn't use cleanroom then you have no defence?

Patent trolls will extort you: the trolls will be using AI models to find "infringing" software, and then they'll strike.

¡There's no way AI can be cleanroom!

esafak · 2025-12-10T19:03:49 1765393429

This is the Engineering- not Literature department. People expect it to be useful.

esafak · 2025-12-10T18:19:26 1765390766

The player has the power to change the game.

nish__ · 2025-12-10T20:12:19 1765397539

That is just not true. The referees have the power to change the game. The fans have the power to change the game. The owners and the commissioners have the power to change the game. The players have no power at all.

spencerflem · 2025-12-10T20:56:22 1765400182

I don’t see how that follows

esafak · 2025-12-10T17:06:07 1765386367

> Sounds completely normal but I can immediately tell it is ai.

Maybe that's a good thing?

esafak · 2025-12-10T16:56:29 1765385789

This is just trivia. I would not use it to test computers -- or humans.

littlestymaar · 2025-12-10T18:58:24 1765393104

It's good way to assess the model with respect to hallucinations though.

I don't think a model should know the answer, but it must be able to know that it doesn't know if you want to use it reliably.

esafak · 2025-12-10T19:07:22 1765393642

No model is good at this yet. I'd expect the flagships to solve the first.

parineum · 2025-12-10T17:09:14 1765386554

Everything is just trivia until you have a use for the answer.

OP provided a we link with the answer, aren't these models supposed to be trained on all of that data?

esafak · 2025-12-10T17:16:31 1765386991

There is nothing useful you can do with this information. You might as well memorize the phone book.

The model has a certain capacity -- quite limited in this case -- so there is an opportunity cost in learning one thing over another. That's why it is important to train on quality data; things you can build on top of.

parineum · 2025-12-11T02:14:34 1765419274

What if you are trying to fix one of these things and needed a list of replacement parts?

esafak · 2025-12-11T02:28:08 1765420088

Not the right problem for this model. Any RAG-backed SLM would do; the important part is being backed by a search engine, like https://google.com/ai

DennisP · 2025-12-10T18:03:40 1765389820

Just because it's in the training data doesn't mean the model can remember it. The parameters total 60 gigabytes, there's only so much trivia that can fit in there so it has to do lossy compression.

esafak · 2025-12-09T21:42:22 1765316542

Or a stepping stone to starting a lighting company. The nature of programming changes when implementation can be automated. That still leaves higher level concerns like design and architecture.