More

akrauss · 2026-01-01T21:45:35 1767303935

This is a good concise summary, regardless of provenance.

akrauss · 2025-12-25T09:21:57 1766654517

It is really important that such posts exist. There is the risk that we only hear about the wild successes and never the failures. But from the failures we learn much more.

One difference between this story and the various success stories is that the latter all had comprehensive test suites as part of the source material that agents could use to gain feedback without human intervention. This doesn’t seem to exist in this case, which may simply be the deal breaker.

enraged_camel · 2025-12-25T09:59:42 1766656782

>> This doesn’t seem to exist in this case, which may simply be the deal breaker.

Perhaps, but perhaps not. The reason tests are valuable in these scenarios is they are actually a kind of system spec. LLMs can look at them to figure out how a system should (and should not) behave, and use that to guide the implementation.

I don’t see why regular specs (e.g. markdown files) could not serve the same purpose. Of course, most GitHub projects don’t include such files, but maybe that will change as time goes on.

morcus · 2025-12-25T15:56:36 1766678196

> I don’t see why regular specs (e.g. markdown files) could not serve the same purpose.

I think because they're doomed to become outdated without something actually enforcing the spec.

akrauss · 2025-12-17T07:08:25 1765955305

What tooling are you using for the orchestration?

akrauss · 2025-12-17T07:04:50 1765955090

Quick feedback: both the „learn more“ link at the very top and the „Explore all examples“ link lead to 404

thomasfromcdnjs · 2025-12-17T07:27:37 1765956457

Thanks will fix that up shortly.

akrauss · 2025-06-26T13:16:30 1750943790

There is one feature in Claude Code which is often overlooked and I haven't seen it in any of the other agentic tools: There is a tool called "sub-agent", which creates a fresh context windows in which the model can independently work on a clearly defined sub-task. This effectively turns Claude Code from a single-agent model to a hierarchical multi-agent model (I am not sure if the hierarchy goes to depths >2).

I wonder if it is a concious decision not to include this (I imagine it opens a lot of possibilities of going crazy, but it also seems to be the source of a great amount of Claud Code's power). I would very much like to play with this if it appears in gemini-cli

Next step would be the possibility to define custom prompts, toolsets and contexts for specific re-occuring tasks, and these appearing as tools to the main agent. Example for such a thing: create_new_page. The prompt could describe the steps one needs to create the page. Then the main agent could simply delegate this as a well-defined task, without cluttering its own context with the operational details.

cperry · 2025-06-26T15:46:29 1750952789

conscious decision not to include it mostly to cut a release we could ship to land yesterday ;)

various forms of this are being discussed, this commentary is helpful thanks!

ericb · 2025-06-28T22:37:02 1751150222

Injecting ENV variables into the template would be super useful.

indigodaddy · 2025-06-26T13:54:29 1750946069

Would Gemini non-interactive mode be a stop gap if they don't have sub-agent equivalent yet?

https://github.com/google-gemini/gemini-cli/blob/main/docs/c...

akrauss · 2025-06-26T14:28:07 1750948087

Possibly. One could think about hooking this in as a tool or simple shell command. But then there is no management when multiple tools modify the codebase simultaneously.

But it is still worth a try and may be possible with some prompting and duct tape.

ramirond · 2025-06-26T15:06:26 1750950386

"sub-agent" sounds incredible! All tools should implement that.

akrauss · 2025-06-26T13:10:04 1750943404

One thing I'd really like to see in coding agents is this: As an architect, I want to formally define module boundaries in my software, in order to have AI agents adhere to and profit from my modular architecture.

Even with 1M context, for large projects, it makes sense to define boundaries These will typically be present in some form, but they are not available precisely to the coding agent. Imagine there was a simple YAML format where I could specify modules and where they can be found in the source tree, and the APIs of other modules it interacts with. Then it would be trivial to turn this into a context that would very often fit into 1M tokens. When an agent decides something needs to be done in the context of a specific module, it could then create a new context window containing exactly that module, effetively turning a large codebase into a small codebase, for which Gemini is extraordinarily effective.

akrauss · 2025-06-11T06:19:13 1749622753

I would be interested in reading what tools are made available to the LLM, and how everything is wired together to form an effective analysis loop. It seems like this is a key ingredient here.

trashtester · 2025-06-11T06:28:49 1749623329

For now, the people able to glue all the necessary ingredients together are the same ones who can understand the output if they drill into it.

Indeed, these may be the last ones to be fired, as they can become efficient enough to do the jobs of everyone else one day.

akrauss · 2025-01-30T05:50:37 1738216237

I can see this being very useful for many admin interfaces where some basic data must be managed by domain experts and UX and is not a priority. Many enterprise applications have such parts.

I wonder what the GPLv3 licensing means for such scenarios: Could people run Mathesar one microservice in an ensemble with proprietary services? Companies who don‘t want to open source their whole product might still be willing to upstream their fixes and improvements to the Mathesar component.

kgodey · 2025-01-30T06:19:57 1738217997

Yep, Mathesar is GPLv3, not AGPL, so there’s no issue running it alongside proprietary services. Companies can absolutely use Mathesar as a standalone service in their stack without open-sourcing their other components. Another example of this is WordPress, which is also GPL, and has a thriving hosting ecosystem.

GPLv3 only applies if you modify and distribute Mathesar itself, it doesn’t extend to services that simply interact with it. If a company makes changes to Mathesar and distributes that modified version, then those modifications would need to be open-sourced under GPLv3. But using Mathesar as a microservice in an enterprise stack? No problem.

We’d love to see companies upstream fixes and improvements, of course!

red_trumpet · 2025-01-30T14:04:21 1738245861

Does AGPL have trouble running alongside proprietary services? I always thought AGPL means that if you host the software, you have to make any changes you did available to the users. So if you host it without changes, there is no problem?

kgodey · 2025-01-30T19:15:38 1738264538

Yep, AGPL can run alongside proprietary services without issues, and if you host it without modifications, you don’t have to share anything. But if you modify it and make it available over a network, you have to provide the source to users.

Mathesar, however, is GPL, so you only need to share modifications if you actually distribute the software itself.

yandie · 2025-01-30T20:06:57 1738267617

When I was at a big tech, the interpretation by their lawyers was that by running AGPL, we will have to open source everything in the network to users. The problem is in the definition of "Modification" - Per [AGPL](https://www.gnu.org/licenses/agpl-3.0.en.html#:~:text=To%20%...):

> "The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities."

This can be interpreted as even modifying any configuration to allow the software to run on your own infrastructure. Obviously this is a very aggressive interpretation but the lawyers didn't want us to test this phrase in court so all AGPL software had a blanket ban.

akrauss · on Aug 25, 2024

There is one thing I'd like to know about this general approach of centralising permission data. I guess my question applies to Permify as well as to its various competitors:

When objects and relations change in the applications database, then these changes will often have to be reflected in the permissions database as well, and the application must keep things in sync. But this is probably harder as it first seems, in the presence of errors and transaction rollbacks.

What about this case:

- User creates a FOO in the application.

- The app creates an entry in the "foo" table and assigns the user id as owner, and attempts to store various other data.

- The app also creates an entity in the permissions database and assigns the user id as an owner.

- Then further steps are performed, and one of them fails, rolling back the transaction in the application database.

I would assume that the change to the permissions database cannot be rolled back then, so there is now an inconsistency.

What do people typically do about these things?

EgeAytin · on Aug 25, 2024

In a standard relational based databases, the suggested place to write relationships to Permify is sending the write request in database transaction of the client action: such as assigning the user as owner.

If the transaction fails, you should delete the malformed relation tuple from Permify to maintain consistency.

Here's an example of how this might look in code:

```

func CreateDocuments(db *gorm.DB) error {

  tx := db.Begin()
  defer func() {
    if r := recover(); r != nil {
      tx.Rollback()
      // if transaction fails, then delete malformed relation tuple
      permify.DeleteData(...)
    }
  }()

  if err := tx.Error; err != nil {
    return err
  }

  if err := tx.Create(docs).Error; err != nil {
     tx.Rollback()
     // if transaction fails, then delete malformed relation tuple
     permify.DeleteData(...)
     return err
  }

  // if transaction successful, write relation tuple to Permify
  permify.WriteData(...)

  return tx.Commit().Error

}

```

Although this is an anti-pattern, this approach ensures that if the transaction in the application database fails and is rolled back, the corresponding data in Permify is also deleted, preventing inconsistencies.

akrauss · on Aug 25, 2024

You call this an anti-pattern (and rightfully so, since cluttering application code like this is a nightmare), but then what is the better pattern? In an app with >100 entities and >500 types of business transactions, many of which can fail in unexpected ways, will I be forced to maintain what was changed in Permify and roll back manually?

If yes, then this is quite a burden and might be a valid reason for not using a separate permissions store. But maybe there are better ways...?

EgeAytin · on Aug 25, 2024

Actually we have webhooks in our cloud offering to streamline and address this. Since this post is about our open source, I didn't mention it as an option. However, if you choose to go with the open source, you would need to maintain it manually as you described. Open to any suggestions on this. We're designing a functionality to add rollback to snapshots[0], but it likely won't be shipped in the near future.

[0]https://docs.permify.co/operations/snap-tokens

rzzzt · on Aug 25, 2024

One option would be to employ a two-phase commit mechanism that keeps track of all "sub-transactions" and considers a global transaction to be completed when all datastores report back that they are very, very certain that they can commit the changes on their end without any issues. Then it asks each local transaction to actually commit.

XA is such a standard that pops up often when data sources support such a mechanism:

- https://en.wikipedia.org/wiki/X/Open_XA

- https://dev.mysql.com/doc/refman/8.0/en/xa.html

- https://mariadb.com/kb/en/xa-transactions/

akrauss · on Aug 25, 2024

I understand. But I'd rather not get into the complexities and new funny failure modes of such a system, so I was hoping for something simpler.

akrauss · on Oct 5, 2023

The journal is named „Journal of Targeting, Measurement and Analysis for Marketing“, which may be the scientific equivalent of a clickbait blogspam site (but I haven’t checked more deeply).

gwervc · on Oct 5, 2023

The paper title itself could pass as-is an Buzzfeed.