There was a wild article that inaccurately overestimated the context of the for the code interpreter fine tune a while back, it’s still 8k like the base model and not the wild 100k like it presented.
I implemented a version of it while I was waiting for openai to roll it out to me with pyodide and it worked well. Got it to re-encode videos right in the browser with a bit of tweaking of the langchain plugins prompt. When I did get it, it was a very nice breath of productivity though their interface could be better for many things beyond that. But their fine tune was better for sure than the API + system prompting.
With Meta supposedly releasing their new model soon, I think anyone who cares about control over their language calculator and not having to be predictably and incessantly lectured by some moralizing RLHF should try to contribute to the community and build out the infrastructure, fine tunes and tools instead of just relying on an org who wanted their moat so badly that they decided to fear monger and confuse the public.
Can you point to any substantial reproductions of copyrighted novels that are output by LLMs? If it’s just compression, you should be able to pull a substantial reproduction of the works out of them.
It doesn't matter if it's a person, or a computer program, or not. This discussion is moot. Is there a substantial reproduction of the works in the output? If not, there's no copyright infringement here.
I don't believe it's copyright infringement either, but I don't believe it should be allowed either.
It used to be that when people bought a piece of land they owned that land from the center of the Earth to the end of the Universe. When planes were invented the laws on who owned the skies had to change or planes wouldn't work.
It's the same here, but in reverse. If LLMs aren't prohibited from learning without permission then people will be forced to hide their works.
Did the trainers illegally access and obtain copyrighted works which are ordinarily protected?
I see multiple questions raised by this suit. Were copyrighted works being illegally stored and distributed by certain sources? Of course they are. Were these illegal sources accessed by the trainers and used to obtain copyrighted works which are not otherwise available for no charge on the public Internet? Are substantial and reproducible copies of these copyrighted works retained within the bowels of the LLM neural nets? Is the LLM able to answer prompts in a way that it would never be able to do, had the copyrighted material not been ingested?
I see the lawsuit addressing several questions at once, and so even a resolution of this suit itself will leave questions unanswered and needing to be kicked upstairs to higher jurisdictions.
I'm not singling out the author or his company, but it's really hard to tell if this article is taking itself seriously, much in the same way that I've seen a group of 19 year old "AI scientists" talk about how everything will be automated in five years.
Let's just take it on its face. OpenAI, the most "powerful" foundation model has a rolling release of every 4-5 months. I've witnessed loss of "reasoning" and outright ignoring of instructions in some cases using the latest models unless you push some things through the functions parameter, which will probably break a lot of prompts, especially the agentic ones. Imagine that, there's no free lunch with a fine tune. You're pretty much at the mercy of a company you're basing your business on not breaking your shit every 5 months. And with how they handled deprecating davinci, it'll keep happening.
In other news, a lot of the VC cash that is flying around right now and is a bit deluded and chasing some vague concept of "AGI" that's coming soon(tm), but given typical Bay Area bullshitting who knows if they are actually a true believer(tm)(c) or they're just throwing around cash looking for your typical unicorn. Strikes me as delusional too. I'm sure you'll have multimodal models any day now that can do everything from pick strawberries to writing and maintaining some bottom of the barrel web app.
So, more delusion and more bullshit. Lately, in general, and it's not a personal thing against the author or his business at all. It's just much in line with the frenzy that I see elsewhere.
You don't need acquired capital for your business because you can rent a VPS and generate social media posts cheaply? Here's your sign.