Sure, but the ways of acquiring those outputs legally have vampiric licensing that bind you to those ToS, since the re-licenser is bound by the original ToS.
It's like distributing GPL code in a nonfree application. Even if you didn't "consent to [the original author's] ToS," you are still going to be bound to it via the redistributors license.
There’s no license. OpenAI is not an author of their models’ outputs, and they know it.
OpenAI can’t just start suing random people on the street without any legal basis. That’s how lawyers become not-lawyers.
There’s only a (somewhat dubiously enforceable) ToS contract between OpenAI and the user of OpenAI’s website. This is probably bullshit too - what legitimate interest does OpenAI have in a model output that doesn’t even belong to them, but it’s less obviously bullshit.
> Even if you didn't "consent to [the original author's] ToS," you are still going to be bound to it via the redistributors license.
In the context of the GPL, are there real examples of judgements which bind defendants to a license they never saw or knew anything about, because of the errant actions of an intermediary?
It gives OpenAI a legal basis to launch a law suit if they want to.
Would it succeed? Is it right? Do they care? Eh.
…but, if I as some random reddit user say I might sue you for making a LLM you for training on data that may or may not have my posts in it, you can probably safely ignore me.
If you go and build a massive LLM using high quality data that couldn’t possibly come from anywhere other than openai, and they have a log of all the content that api key XXX generated; they both know and have a legal basis for litigation.
There’s a difference, even if you’re a third party (not the owner of the api key) or don’t care.
(And I’m not saying they would, or even they would win; but it’s sufficient cause for them to be able to make a case if they want to)
Just being able to make a case doesn't mean they will consider the legal fees and resulting judgment to be valuable enough to their business, nor that the suit will even make it into the courts resulting in a final judgment.
A lot of behavior that rides this line is rationalized via a careful cost-benefit analysis.
Sure, I'm just saying that in that cost-benefit analysis the 'risk of case failing and getting nothing from it' is significantly lower; it's your call as a consumer to do your due diligence and decide:
"I don't think they'll be bothered chasing after me"
vs.
"If it came to it I think the court will rule that they don't have a case after we play the pay-the-lawyers-all-the-money game"
vs.
"how screwed am I if they do, I lose and I have clearly, blatantly and provably violated their terms of service"
^
...because, and this is the point I'm making. There is no question; it is very very obvious if you do this, and it's not very difficult for them to prove it.
All they need is to slap you with a discovery order, look at your training data, and compare it to their output logs.
It's like distributing GPL code in a nonfree application. Even if you didn't "consent to [the original author's] ToS," you are still going to be bound to it via the redistributors license.