It is really important that such posts exist. There is the risk that we only hear about the wild successes and never the failures. But from the failures we learn much more.
One difference between this story and the various success stories is that the latter all had comprehensive test suites as part of the source material that agents could use to gain feedback without human intervention. This doesn’t seem to exist in this case, which may simply be the deal breaker.
>> This doesn’t seem to exist in this case, which may simply be the deal breaker.
Perhaps, but perhaps not. The reason tests are valuable in these scenarios is they are actually a kind of system spec. LLMs can look at them to figure out how a system should (and should not) behave, and use that to guide the implementation.
I don’t see why regular specs (e.g. markdown files) could not serve the same purpose. Of course, most GitHub projects don’t include such files, but maybe that will change as time goes on.
There is one feature in Claude Code which is often overlooked and I haven't seen it in any of the other agentic tools: There is a tool called "sub-agent", which creates a fresh context windows in which the model can independently work on a clearly defined sub-task. This effectively turns Claude Code from a single-agent model to a hierarchical multi-agent model (I am not sure if the hierarchy goes to depths >2).
I wonder if it is a concious decision not to include this (I imagine it opens a lot of possibilities of going crazy, but it also seems to be the source of a great amount of Claud Code's power). I would very much like to play with this if it appears in gemini-cli
Next step would be the possibility to define custom prompts, toolsets and contexts for specific re-occuring tasks, and these appearing as tools to the main agent. Example for such a thing: create_new_page. The prompt could describe the steps one needs to create the page. Then the main agent could simply delegate this as a well-defined task, without cluttering its own context with the operational details.
Possibly. One could think about hooking this in as a tool or simple shell command. But then there is no management when multiple tools modify the codebase simultaneously.
But it is still worth a try and may be possible with some prompting and duct tape.
One thing I'd really like to see in coding agents is this: As an architect, I want to formally define module boundaries in my software, in order to have AI agents adhere to and profit from my modular architecture.
Even with 1M context, for large projects, it makes sense to define boundaries These will typically be present in some form, but they are not available precisely to the coding agent. Imagine there was a simple YAML format where I could specify modules and where they can be found in the source tree, and the APIs of other modules it interacts with. Then it would be trivial to turn this into a context that would very often fit into 1M tokens. When an agent decides something needs to be done in the context of a specific module, it could then create a new context window containing exactly that module, effetively turning a large codebase into a small codebase, for which Gemini is extraordinarily effective.
I would be interested in reading what tools are made available to the LLM, and how everything is wired together to form an effective analysis loop. It seems like this is a key ingredient here.
I can see this being very useful for many admin interfaces where some basic data must be managed by domain experts and UX and is not a priority. Many enterprise applications have such parts.
I wonder what the GPLv3 licensing means for such scenarios: Could people run Mathesar one microservice in an ensemble with proprietary services? Companies who don‘t want to open source their whole product might still be willing to upstream their fixes and improvements to the Mathesar component.
Yep, Mathesar is GPLv3, not AGPL, so there’s no issue running it alongside proprietary services. Companies can absolutely use Mathesar as a standalone service in their stack without open-sourcing their other components. Another example of this is WordPress, which is also GPL, and has a thriving hosting ecosystem.
GPLv3 only applies if you modify and distribute Mathesar itself, it doesn’t extend to services that simply interact with it. If a company makes changes to Mathesar and distributes that modified version, then those modifications would need to be open-sourced under GPLv3. But using Mathesar as a microservice in an enterprise stack? No problem.
We’d love to see companies upstream fixes and improvements, of course!
Does AGPL have trouble running alongside proprietary services? I always thought AGPL means that if you host the software, you have to make any changes you did available to the users. So if you host it without changes, there is no problem?
Yep, AGPL can run alongside proprietary services without issues, and if you host it without modifications, you don’t have to share anything. But if you modify it and make it available over a network, you have to provide the source to users.
Mathesar, however, is GPL, so you only need to share modifications if you actually distribute the software itself.
When I was at a big tech, the interpretation by their lawyers was that by running AGPL, we will have to open source everything in the network to users. The problem is in the definition of "Modification" - Per [AGPL](https://www.gnu.org/licenses/agpl-3.0.en.html#:~:text=To%20%...):
> "The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities."
This can be interpreted as even modifying any configuration to allow the software to run on your own infrastructure. Obviously this is a very aggressive interpretation but the lawyers didn't want us to test this phrase in court so all AGPL software had a blanket ban.
There is one thing I'd like to know about this general approach of centralising permission data. I guess my question applies to Permify as well as to its various competitors:
When objects and relations change in the applications database, then these changes will often have to be reflected in the permissions database as well, and the application must keep things in sync. But this is probably harder as it first seems, in the presence of errors and transaction rollbacks.
What about this case:
- User creates a FOO in the application.
- The app creates an entry in the "foo" table and assigns the user id as owner, and attempts to store various other data.
- The app also creates an entity in the permissions database and assigns the user id as an owner.
- Then further steps are performed, and one of them fails, rolling back the transaction in the application database.
I would assume that the change to the permissions database cannot be rolled back then, so there is now an inconsistency.
In a standard relational based databases, the suggested place to write relationships to Permify is sending the write request in database transaction of the client action: such as assigning the user as owner.
If the transaction fails, you should delete the malformed relation tuple from Permify to maintain consistency.
Here's an example of how this might look in code:
```
func CreateDocuments(db *gorm.DB) error {
tx := db.Begin()
defer func() {
if r := recover(); r != nil {
tx.Rollback()
// if transaction fails, then delete malformed relation tuple
permify.DeleteData(...)
}
}()
if err := tx.Error; err != nil {
return err
}
if err := tx.Create(docs).Error; err != nil {
tx.Rollback()
// if transaction fails, then delete malformed relation tuple
permify.DeleteData(...)
return err
}
// if transaction successful, write relation tuple to Permify
permify.WriteData(...)
return tx.Commit().Error
}
```
Although this is an anti-pattern, this approach ensures that if the transaction in the application database fails and is rolled back, the corresponding data in Permify is also deleted, preventing inconsistencies.
You call this an anti-pattern (and rightfully so, since cluttering application code like this is a nightmare), but then what is the better pattern? In an app with >100 entities and >500 types of business transactions, many of which can fail in unexpected ways, will I be forced to maintain what was changed in Permify and roll back manually?
If yes, then this is quite a burden and might be a valid reason for not using a separate permissions store. But maybe there are better ways...?
Actually we have webhooks in our cloud offering to streamline and address this. Since this post is about our open source, I didn't mention it as an option. However, if you choose to go with the open source, you would need to maintain it manually as you described. Open to any suggestions on this. We're designing a functionality to add rollback to snapshots[0], but it likely won't be shipped in the near future.
One option would be to employ a two-phase commit mechanism that keeps track of all "sub-transactions" and considers a global transaction to be completed when all datastores report back that they are very, very certain that they can commit the changes on their end without any issues. Then it asks each local transaction to actually commit.
XA is such a standard that pops up often when data sources support such a mechanism:
The journal is named „Journal of Targeting, Measurement and Analysis for Marketing“, which may be the scientific equivalent of a clickbait blogspam site (but I haven’t checked more deeply).
reply