Interesting, I have the opposite impression. I want to like it because it's the biggest model I can run at home, but its punchy style and insistence on heavily structured output scream "tryhard AI." I was really hoping that this model would deviate from what I was seeing in their previous release.
what do you mean by "heavily structured output"? i find it generates the most natural-sounding output of any of the LLMs—cuts straight to the answer with natural sounding prose (except when sometimes it decides to use chat-gpt style output with its emoji headings for no reason). I've only used it on kimi.com though, wondering what you're seeing.
Yeah, by "structured" I mean how it wants to do ChatGPT-style output with headings and emoji and lists and stuff. And the punchy style of K2 0905 as shown in the fiction example in the linked article is what I really dislike. K2 Thinking's output in that example seems a lot more natural.
I'd be totally on board if cut straight to the answer with natural sounding prose, as you described, but for whatever reason that has not been my experience.
Interesting. As others have noted, it has a cut straight to the point non-psychophantic style that I find exceptionally rich in detailey and impressive. But it sounds like you're saying an earlier version was even better.
> I find it generates the most natural-sounding output of any of the LLMs
Curious, does it do as well/natural as claude 3.5/3.6 sonnet? That was imo the most "human" an AI has ever sounded. (Gemini 2.5 pro is a distant second, and chatgpt is way behind imo.)
If you want to do it at home, ik_llama.cpp has some performance optimizations that make it semi-practical to run a model of this size on a server with lots of memory bandwidth and a GPU or two for offload. You can get 6-10 tok/s with modest hardware workstation hardware. Thinking chews up a lot of tokens though, so it will be a slog.
Hi Simon. I have a Xeon W5-3435X with a 768GB of DDR5 across 8 channels, iirc it's running at 5800MT/s. It also has 7x A4000s, water cooled to pack them into a desktop case. Very much a compromise build, and I wouldn't recommend Xeon sapphire rapids because the memory bandwidth you get in practice is less than half of what you'd calculate from the specs. If I did it again, I'd build an EPYC machine with 12 channels of DDR5 and put in a single rtx 6000 pro blackwell. That'd be a lot easier and probably a lot faster.
There's a really good thread on level1techs about running DeepSeek at home, and everything there more-or-less applies to Kimi K2.
My employer was running Growthpower (ERP software) on an HP 3000 system up until 2018 or so. We replaced it with a "modern" .NET/MSSQL ERP solution that does a lot more, but it's slow and terrible to navigate compared to the old console menu system, and its database is hundreds of tables without a single foreign key. The frontend application makes a long series of sequential queries to build each view... if you're willing to wade through the muck, you can write a server side query that can do in milliseconds what it does in minutes.
I don’t own a laptop. I run DeepSeek-V3 IQ4_XS on a Xeon workstation with lots of RAM and a few RTX A4000s.
It’s not very fast, and I built it up slowly without knowing quite where I was headed. If I could do it over again, I’d go with a recent EPYC with 12 channels of DDR5 and pair it with a single RTX 6000 Pro Blackwell.
I used Omron's K3GN panel meters in a project at work and I had to draw the alphabet in the configuration drawing because it is so unintuitive. It's not a whole lot worse than the one shown in the article, but still... it's pretty rough. I think I prefer numbered parameters like you typically see on VFDs. It's a lot easier to just scroll to P148 or whatever, enter to view/modify, scroll the value, enter to set. Menu trees on seven-segment interfaces are a mistake.