Why are your models so big? (2023)

unleaded · 2025-12-06T00:28:29 1764980909

Still relevant today. Many problems people throw onto LLMs can be done more efficiently with text completion than begging a model 20x the size (and probably more than 20x the cost) to produce the right structured output. https://www.reddit.com/r/LocalLLaMA/comments/1859qry/is_anyo...

crystal_revenge · 2025-12-06T03:41:18 1764992478

I used to work very heavily with local models and swore by text completion despite many people thinking it was insane that I would choose not to use a chat interface.

LLMs are designed for text completion and the chat interface is basically a fine-tuning hack to make prompting a natural form of text completion to have a more "intuitive" interface for the average user (I don't even want to think about how many AI "enthusiasts" don't really understand this).

But with open/local models in particular: each instruct/chat interface is slightly different. There are tools that help mitigate this, but the more you're working closely to the model the more likely you are to make a stupid mistake because you didn't understand some detail about how the instruct interface was fine tuned.

Once you accept that LLMs are "auto-complete on steroids" you can get much better results by programming the way they were naturally designed to work. It also helps a lot with prompt engineering because you can more easily understand what the models natural tendency is and work with that to generally get better results.

It's funny because a good chunk of my comments on HN these days are combating AI hype, but man are LLMs really fascinating to work with if you approach them with a bit more clear headed of a perspective.

jsight · 2025-12-06T03:01:14 1764990074

Why would you do that when you could spend months building metadata and failing to tune prompts for a >100B parameter LLM? /s

lynndotpy · 2025-12-06T01:12:55 1764983575

> I think the future will be full of much smaller models trained to do specific tasks.

This was the very recent past! Up until we got LLM-crazy in 2021, this was the primary thing that deep learning papers produced: New models meant to solve very specific tasks.

jsight · 2025-12-06T03:02:25 1764990145

Yeah, it is insane how many people think that tuning models is nearly impossible, or that it requires a multibillion dollar data center.

It is one of the weirdest variations of people buying into too much hype.

socketcluster · 2025-12-06T03:39:47 1764992387

The incumbent are trying to fully control the market but they don't have a justification for that. A company like Google which already had a monopoly over search needs to convince the market that this will allow them to expand past search. If the narrative is that anyone can run a specialized model on their machines for different tasks, this doesn't justify AI companies selling themselves on the assumption of a total market monopoly and stranglehold over the economy.

They cannot sell themselves without concealing reality. This is not a new thing. There were a lot of suppressed projects in Blockchain industry where everyone denied the existence of certain projects and most people never heard about them and talk as if the best coin in existence can do a measly 4 transactions per second as if it's state of the art... Solutions like "Lightning network" don't actually work but they are pitched as revolutionary... I bet there are more people shilling Bitcoin's Lightning network than they are people actually using it. This is the power of centralized financial incentives. Everyone ends up operating on top of shared deception "the official truth" which may not be true at all.

brainless · 2025-12-06T02:06:51 1764986811

May I add Gliner to this? The original Python version and the Rust version. Fantastic (non LLM) models for entity extraction. There are many others.

I really think using small models for a lot of smell tasks is the best way forward but it's not easy to orchestrate.

siddboots · 2025-12-05T23:55:11 1764978911

I think I have almost the opposite intuition. The fact that attention models are capable of making sophisticated logical constructions within a recursive grammar, even for a simple DSL like SQL, is kind of surprising. I think it’s likely that this property does depend on training on a very large and more general corpus, and hence demands the full parameter space that we need for conversational writing.

lsb · 2025-12-06T00:31:25 1764981085

My threshold for “does not need to be smaller” is “can this run on a Raspberry Pi”. This is a helpful benchmark for maximum likely useful optimization.

A Pi has 4 cores and 16GB of memory these days, so, running Qwen3 4B on a pi is pretty comfortable: https://leebutterman.com/2025/11/01/prompt-optimization-on-a...

semiinfinitely · 2025-12-06T00:27:30 1764980850

I don’t understand why today’s laptops are so large. Some of the smallest "ultrabooks" getting coverage sit at 13 inches, but even this seems pretty big to me.

If you need raw compute, I totally get it. Things like compiling the Linux kernel or training local models require a high level of thermal headroom, and the chassis has to dissipate heat in a manner that prevents throttling. In cases where you want the machine to act like a portable workstation, it makes sense that the form factor would need to be a little juiced up.

That said, computing is a whole lot more than just heavy development work. There are some domains that have a tightly-scoped set of inputs and require the user to interact in a very simple way. Something like responding to an email is a good example — typing "LGTM" requires a very small screen area, and it requires no physical keyboard or active cooling. checking the weather is similar: you don’t need 16 inches of screen real estate to go from wondering if it’s raining to seeing a cloud icon.

I say all this because portability is expensive. Not only is it expensive in terms of back pain — maintaining the ecosystem required to run these machines gets pretty complicated. You either end up shelling out money for specialized backpacks or fighting for outlet space at a coffee shop just to keep the thing running. In either case, you’re paying big money (and calorie) costs every time a user types remind me to eat a sandwich.

I think the future will be full of much smaller devices. Some hardware to build these already exists, and you can even fit them in your pocket. This mode of deployment is inspiring to me, and I’m optimistic about a future where 6.1 inches is all you need.

bee_rider · 2025-12-06T01:58:08 1764986288

I dunno. It kinda works, and points for converting the whole article. But something is lost in the switch-up here. The size of a laptop is more or less the size of the display (unless we’re going to get weird and have a projector built in), so it is basically a figure-of-merit.

Nobody actually wants more weights in their LLMs, right? They want the things to be “smarter” in some sense.

hobs · 2025-12-06T03:04:00 1764990240

With a comfortable spread out my hands are 9.5 inches from pinky to thumb, a thirteen inch laptop is so painfully small I can barely use it.

Archelaos · 2025-12-06T01:06:38 1764983198

A typical use case for large laptops is when you want to store it away after work or when you only carry it occasionally. I have a PC for coding at home, but use a thinkpad with the largest screen I could get for coding in my camper van (storing it away when not using it, because of lack of space) or when staying at my mother's home for longer (setting it up once at the start of my visit). I also have another very small, light and inexpensive subnotebook that I can carry around easily, but I rarely use it these days and not for coding at all.

debo_ · 2025-12-06T00:35:53 1764981353

2000: My spoon is too big

2023: My model is too big

jgalt212 · 2025-12-06T03:28:57 1764991737

The net $5.5T the fed printed had to go somewhere. AI Arms Race was the answer. And when the models got good, then we needed agentic to create unbounded demand for inference just as there was unbounded demand for training.

https://fred.stlouisfed.org/series/WALCL