More

XCSme · 2026-01-28T01:47:03 1769564823

I thought this was about the Prism Database ORM. Or that was Prisma?

XCSme · 2026-01-27T12:25:26 1769516726

There are many lists, but I find all of them outdated or containing wrong information or missing the actual benchmarks I'm looking for.

I was thinking, that maybe it's better to make my own benchmarks with the questions/things I'm interested in, and whenever a new model comes out run those tests with that model using open-router.

XCSme · 2026-01-27T12:23:26 1769516606

But in the end, isn't this the same idea with the MoE?

Where we have more specialized "jobs", which the model is actually trained for.

I think the main difference with agents swarm is the ability to run them in parallel. I don't see how this adds much compared to simply sending multiple API calls in parallel with your desired tasks. I guess the only difference is that you let the AI decide how to split those requests and what each task should be.

zozbot234 · 2026-01-27T12:41:42 1769517702

Nope. MoE is strictly about model parameter sparsity. Agents are about running multiple small-scale tasks in parallel and aggregating the results for further processing - it saves a lot of context length compared to having it all in a single session, and context length has quadratic compute overhead so this matters. You can have both.

One positive side effect of this is that if subagent tasks can be dispatched to cheaper and more efficient edge-inference hardware that can be deployed at scale (think nVidia Jetsons or even Apple Macs or AMD APU's) even though it might be highly limited in what can fit on the single node, then complex coding tasks ultimately become a lot cheaper per token than generic chat.

XCSme · 2026-01-27T12:56:39 1769518599

Yes, I know you can have both.

My point was that this is just a different way of creating specialised task solvers, the same as with MoE.

And, as you said, with MoE it's about the model itself, and it's done at training level so that's not something we can easily do ourselves.

But with agent swarm, isn't it simply splitting a task in multiple sub-tasks and sending each one in a different API call? So this can be done with any of the previous models too, only that the user has to manually define those tasks/contexts for each query.

Or is this at a much more granular level than this, which would not be feasible to be done by hand?

I was already doing this in n8n, creating different agents with different system prompts for different tasks. I am not sure if automating this (with swarm) would work well in my most cases, I don't see how this fully complements Tools or Skills

zozbot234 · 2026-01-27T13:08:45 1769519325

MoE has nothing whatsoever to do with specialized task solvers. It always operates per token within a single task, you can think of it perhaps as a kind of learned "attention" for model parameters as opposed to context data.

XCSme · 2026-01-27T13:18:03 1769519883

Yes, specific weights/parameters have be trained to solve specific tasks (trained on different data).

Or did I misunderstand the concept of MoE, and it's not about having specific parts of the model (parameters) do better on specific input contexts?

XCSme · 2026-01-27T12:17:06 1769516226

> Kimi K2.5 can self-direct an agent swarm

Is this within the model? Or within the IDE/service that runs the model?

Because tool calling is mostly just the agent outputting "call tool X", and the IDE does it and returns the data back to AI's context

mzl · 2026-01-27T12:42:55 1769517775

An LLM model only outputs tokens, so this could be seen as an extension of tool calling where it has trained on the knowledge and use-cases for "tool-calling" itself as a sub-agent.

XCSme · 2026-01-27T13:01:52 1769518912

Ok, so agent swarm = tool calling where the tool is a LLM call and the argument is the prompt

IanCal · 2026-01-27T15:16:29 1769526989

Yes largely, although they’ve trained a model specifically for this task rather than using the base model and a bit of prompting.

dcre · 2026-01-27T14:12:57 1769523177

Sort of. It’s not necessarily a single call. In the general case it would be spinning up a long-running agent with various kinds of configuration — prompts, but also coding environment and which tools are available to it — like subagents in Claude Code.

XCSme · 2026-01-20T23:50:36 1768953036

I don't understand how "agent-browser" works.

Is it just the instructions? Where is the browsing executed? Locally with pupetter? Or it uses some service?

tomaspiaggio12 · 2026-01-21T00:13:39 1768954419

it's basically a cli for controlling a browser. the idea is that an agent like claude code would use it for validating something that it just did like changing something on the UI

XCSme · 2026-01-21T00:16:16 1768954576

What browser? My question comes from security, adding that skills just provides a line of bash, with no further info. I checked the .md file but it just lists a list of commands with agent-browser.

cheema33 · 2026-01-21T00:58:26 1768957106

agent-browser is built on top of Playwright. Playwright uses a version of Chromium.

esperent · 2026-01-21T08:57:07 1768985827

I was looking at this earlier. Has anyone used it? Is it useful compared to the Playwright MCP or Claude's Chrome plugin?

jimmydoe · 2026-01-27T14:07:25 1769522845

Agent browser is more lightweight than playwright mcp. Claude Chrome requires some manual setup, and works better in cases requires your actual browser not a headless one.

XCSme · 2026-01-19T15:46:08 1768837568

Seems to be marginally better than gpt-20b, but this is 30b?

strangescript · 2026-01-19T15:50:58 1768837858

I find gpt-oss 20b very benchmaxxed and as soon as a solution isn't clear it will hallucinate.

blurbleblurble · 2026-01-19T16:38:22 1768840702

Every time I've tried to actually use gpt-oss 20b it's just gotten stuck in weird feedback loops reminiscent of the time when HAL got shut down back in the year 2001. And these are very simple tests e.g. I try and get it to check today's date from the time tool to get more recent search results from the arxiv tool.

lostmsu · 2026-01-19T16:00:52 1768838452

It actually seems worse. gpt-20b is only 11 GB because it is prequantized in mxfp4. GLM-4.7-Flash is 62 GB. In that sense GLM is closer to and actually is slightly larger than gpt-120b which is 59 GB.

Also, according to the gpt-oss model card 20b is 60.7 (GLM claims they got 34 for that model) and 120b is 62.7 on SWE-Bench Verified vs GLM reports 59.7

XCSme · 2026-01-13T14:51:45 1768315905

This happened a few months ago, but I just found out about it, so I thought it was worth sharing. I was surprised to see this, as I was under the impression the project was doing well and growing (I did assume though their revenue lacked, because most people who knew about the platform were self-hosters).

jaggs · 2026-01-13T15:46:10 1768319170

Yes, there was much depression amongst InvokeAI users when that shift happened back in January. Luckily, in the past two months a rather genius contributor has added the superb Z-Image turbo model, so the open source app is now on a tear. https://www.reddit.com/r/StableDiffusion/comments/1q3ruuo/re...

XCSme · 2026-01-12T13:54:17 1768226057

Writing tutorials and adding a REST api for UXWizz (self-hosted web analytics)

XCSme · 2026-01-10T00:50:47 1768006247

Is the word "stamppot" ?

usrnm · 2026-01-10T08:13:29 1768032809

Just "food". Any kind of Dutch food fits the description.

skrebbel · 2026-01-10T09:29:46 1768037386

This is true, notably a kroket is both looping and badly compressed.

XCSme · 2026-01-09T23:38:07 1768001887

> beyond LLMs to specialized approached

Do you mean that in this case, it was not a LLM?

D-Machine · 2026-01-09T23:45:42 1768002342

It could not be done without Aristotle (https://arxiv.org/pdf/2510.01346), as clearly described in Tao's posts.

Davidzheng · 2026-01-10T00:48:41 1768006121

Never mind what Aristotle is, verifier llm models are definitely strong enough to verify proofs of elementary methods used here.

TeMPOraL · 2026-01-09T23:55:11 1768002911

Aristotle is an LLM system.

D-Machine · 2026-01-09T23:59:28 1768003168

"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"

It is far more than an LLM, and math != "language".

TeMPOraL · 2026-01-10T00:40:10 1768005610

> Aristotle integrates three main components (...)

The second one being backed by a model.

> It is far more than an LLM

It's an LLM with a bunch of tools around it, and a slightly different runtime that ChatGPT. It's "only" that, but people - even here, of all places - keep underestimating just how much power there is in that.

> math != "language".

How so?

D-Machine · 2026-01-10T00:48:45 1768006125

Transformer != LLM. See my edited top-level post. Just because Aristotle uses a transformer doesn't mean it is an LLM, just as Vision Transformers and AlphaFold use transformers but are not LLMs.

LLM = Large Language Model. Large refers to both the number of parameters (and in practice, depth) of the model, and also implicitly the amount of data used for training, and "language" means human (i.e. written, spoken) language. A Vision Transformer is not an LLM because it is trained on images, and AlphaFold is not an LLM because it is trained molecular configurations.

Aristotle works heavily with formalized LEAN statements and expressions. While you can certainly argue this is a language of sorts, it is not at all the same "language" as the "language" in LLMs. Calling Aristotle an "LLM" just because it has a transformer is more misleading than truthful, because every other single aspect of it is far more clever and involved.

TeMPOraL · 2026-01-10T01:18:13 1768007893

The paper you keep linking literally says they're using a large language model (search for that very string in it).

D-Machine · 2026-01-10T01:33:59 1768008839

Sigh. If I start with a pre-trained LLM architecture, and then do extensive further training / fine-tuning with different data and loss functions and custom similarity metrics for specialized search and specialized training procedures, and use feedback from other automated systems, we are far, far more than an LLM. That's the point. Calling something like this an LLM is as deeply misleading as calling AlphaFold an LLM. These tools goes far beyond simple LLMs. The special losses and metrics are really so important here and are why these tools can be so game-changing.

XCSme · 2026-01-10T00:53:55 1768006435

I kind of agree, "math" can be a "language". Same as "images" can be a language. You can use anything as tokens.

raincole · 2026-01-10T01:31:48 1768008708

In this context, we're not even talking about "math" (as a broad, abstract concept). We're strictly talking about converting English to Lean. Both are just languages. Lean isn't just something that can be a language. It's a language.

There is no reason or framing where you can say Aristotle isn't a language model.

TeMPOraL · 2026-01-10T01:16:25 1768007785

That's true, and a good fundamental point. But here it's much simpler than that: math is a language the same way code is, and if there's one thing LLMs excel at, it's reading and writing code and translating back and forth between code and natural language.