Yeah, in all sorts of ways. You can look at it in a Grid of components. You can look at it in a Map, seeing all the relationships. You can look at it via an API. You can have an AI Agent summarize it for you. It's super transparent.
It's absolutely AWS focused today - but one upside of the approach is that building the models is straightforward, because we can build a pipeline that goes from the upstream's specification, augments it with documentation and validation, etc. We'll certainly be expanding coverage.
You should check out the site again today - I think it will help at least at a high level of what it's like to use System Initiative today.
We didn't recreate the AWS API. Rather than think about it as the API calls, imagine it this way. You have a real resource, say an EC2 instance. It has tons of properties, like 'ImageId', 'InstanceType', or 'InstanceId'. Over the lifetime of that EC2 instance, some of those properties might change, usually because someone takes action on that instance - say to start, stop, or restart it. That gets reflected in the 'state' of the resource. If that resource changes, you can look at the state of it and update the resource (in what is a very straightforward operation most of the time.)
The 'digital twin' (what we call a component) is taking that exact same representation that AWS has, and making a mirror of it. Imagine it like a linked copy. Now, on that copy, you can set properties, propose actions, validate your input, apply your policy, etc. You can compare it to the (constantly evolving, perhaps) state of the real resource.
So we track the changes you make to the component, make sure they make sense, and then let you review everything you (or an AI agent) are proposing. Then when it comes time to actually apply those changes to the world, we do that for you directly.
A few other upsides of this approach. One is that we don't care how a change happens. If you change something outside of System Initiative, that's fine - the resource can update, and then you can look at the delta and decide if it's beneficial or not. Because we track changes over time, we can do things like replay those changes into open change sets - basically making sure any proposed changes you are making are always up to date with the real world.
Ryan can give you more details about his own experience. (I'm the CEO of System Initiative) But a lot of it comes from switching to a model where you work with an AI agent alongside digital twins of the infrastructure.
In particular, debugging speed improves because you can ask the agent questions like:
`I have a website running on ec2 that is not working. Make a plan to discover all the infrastructure components that could have an impact on why I can't reach it from a web browser, then troubleshoot the issue.`
And it will discover infrastructure, evaluate the configuration, and see if it can find the issue. Then it can make the fix in a simulation, humans can review it, and you're done. It handles all the audit trails, review, state, etc for you under the hood - so the actual closing of the troubleshooting loop happens much faster as well.
When you say 'digital twins of the infrastructure' you mean another deployed instance? So if they'd just made a preview environment created upon a pull request they'd have just got the same speed up.
> It handles all the audit trails, review, state, etc for you under the hood.
So there is no more IaC SI now manages everything?
Nope - I mean we make a 1:1 model of the real resource, and then let you propose changes to that data model. Rather than thinking of it like code in a file, think of it like having a live database that does bi-directional sync. The speedup in validating the change happens because we can run it on the data model, rather than on 'real' infrastructure.
Then we track the changes you make to that hypothetical model, and when you like it, apply the specific actions needed to make the real infrastructure conform. All the policy checking, pipeline processing, state file management, etc. is all streamlined.
Yes hitting apply would update the production infrastructure, but what if I want to run automated testing to check a change/new feature? I can't do that on a simulation.
Right - obviously, if you need the actual code deployed to run your test, there is not much anyone can do about that. But let me tell you how you would set that up, from scratch, in System Initiative (assuming you have a working deployment at all).
I assume the use case here is 'I want to deploy the application on every pull request to net-new infrastructure, then run my test suite, and destroy the test infrastructure once the PR is merged or the code is updated'.
You would fire up the AI Agent and ask it to discover an existing deployment of the application. Probably give it a hint or the boundaries you care about (stop at the network layer, for example - you probably don't want to deploy a net new VPC, subnets, or internet gateways). Once that's done, you'll have a model of the infrastructure for your application in System Initiative.
Then you'll turn that into a repeatable template component, by either asking the AI to do it for you, or selecting the related infrastructure in our Web UI and hitting 'T'. You'll add some attributes like 'version' to the template, and plumb them through to right spot in the code we generate for you.
Then you're going to call that code from our GitHub action on every PR, setting the name and the version number from the branch and the artifact version, naming the change set after the PR as well. You'll let the action apply the change set itself, which will then create the infrastructure.
The next step will be to run your tests against the infrastructure.
On merge you'll have another GitHub action that opens a change set and deletes the infrastructure you just created, so you don't waste any cash.
Notice what I didn't tell you to do - figure out how to create new state files, build new CI/CD pipelines, or anything else. Just started from the actual truth of what you already have, used our digital twins to make a repeatable template out of it, then told the platform to do it over and over again with an external API.
Nope. Terraform/OpenTofu state has several big differences.
The first is that Terraform/Tofu can drift. This is why people suffer when a change gets made outside of IaC, and the statefile no longer tracks. That's because IaC tools are by design unidirectional - change should only ever flow from the IaC to the Infrastructure. In SI, this is fine - the resource state can update, and then you can decide if it was beneficial (at which point we just update the component side of the equation, and you're done) or not (at which point you would decide what action to take to revert the change.)
The second is how it gets generated. In Terraform/Tofu, it's a side effect of the 'apply' phase - basically a compile time artifact. In System Initiative it's the heart of the system - the code you write is operating on that model, not generating that model. This makes programming it much simpler. You can change the model through our Web UI, you can change it through an API, you can change it with an AI Agent, the resource can change because the underlying cloud provider changes it, and it all just works.
State can drift in SI as well unless you are subscribing to events from AWS that alert your system as soon as resource is changed so you can update your side.
>the code you write is operating on that model, not generating that model.
What are you talking about? That model is not reality because reality is whatever the state of resource is in AWS. If your model says my S3 bucket is not public but someone changes it in AWS to make it public, who cares, it's public and that's what's important. Sure, your system may update itself more frequently than only when I run "tofu plan/apply" but at end of the day, it doesn't matter.
All I'm saying as SRE, you have done poor job selling this to me. I'm telling you what I would tell my boss if he came to me with this product.
"This is some custom IaC system with AI Agents sprinkled on top. I guess if you want to get rid of SRE team and replace us with their consultants, whatever, I won't be here to care. If you want us as SRE team to use it, nope, it's a waste of money since OpenToFu has much better support. Can you approve my SpaceLift purchase instead?"
> Sure, your system may update itself more frequently than only when I run "tofu plan/apply" but at end of the day, it doesn't matter.
Correct me if I'm wrong here. In my experience you have to "apply" before state is updated. This would mean we weren't quite operating on the source of truth. (aws in this case).
100% it's a solvable problem with a TF centric tool chain. But it's still a problem that needs solving.
In my experience with SI it fades to the background. Now, I'm sure there is an edge case where someone edits something outside of SI while I'm trying to simultaneously update it in SI where things might break. I haven't run into it yet.
> All I'm saying as SRE, you have done poor job selling this to me
Can't argue this, but I would say like any other new tool, it's worth checking out. :)
Yes, at apply stage, the state is updated. All the state is useful for is finding the resource for big 3. In fact, I'd argue for TF, they could do away with state file beyond
resource "s3_bucket" "thebucket" -> arn:aws:s3:us-east-2:000:0123455 since they pull down the current state of system as is and then show you the "This is what you want and with current state, this is what it will change."
> I would say like any other new tool, it's worth checking out. :)
I don't see the need for a couple of reasons:
1) How? If you want me to try something, either big "TRY ME" unless it involves becoming a client which that case, I see you as replacing me so my motivation is zero. :D
2) I'm on Azure for most part so it's useless anyways.
3) You have not shown me how SI is that much better than Terraform. If I'm going to invest time over yelling at Kubernetes, I need to know my time is worth it.
At the end of the day, we all want the same. Here is defined infrastructure, be it YAML, JSON, HCL, some GUI, API call to a system, Whatever AI is smoking. Ability to see what changes and make those changes. HCL/ToFu is what most of have picked because it's pretty open and widely supported across all the providers. You have to overcome all that. This blog post reads, we have this great new Windows Server thing that will blow your Linux Server stuff away completely with GUI and Siri.
Maybe that's what your customer base needs. However, at technology companies I work at, we don't need that. People editing outside IaC is done very slowly, deliberately and almost always backported. If not, you will get called out. It would like Dev writing code and no tests.
Thanks for responding. I love these conversations.
>1) How? If you want me to try something, either big "TRY ME" unless it involves becoming a client which that case, I see you as replacing me so my motivation is zero. :D
To be clear, I don't see this as having the capability to replace engineers. This is a new way to interact with your infrastructure.
But it also feeds into larger AI conversations as well. (probably out of scope for this conversation :) )
> This blog post reads, we have this great new Windows Server thing that will blow your Linux Server stuff away completely with GUI and Siri.
Now you crossed a line :D
Jokes aside, there is a HUGE community in the TF and now OT space, you can't argue that. The ecosystem of third-party tools to help support workflows is gigantic. Putting all of that aside, will that be the best way forever? I'm not saying that SI will replace either one of those, but I would say it's a new and refreshing way to tackle a similar problem space.
However! Folks with big IaC deployments can still use all the discovery and troubleshooting goodness, and then make the change however they want. System Initiative is fine either way.
Personally moving away from IaC is a big yikes, for something so critical to my company no way would I let myself be locked into your product. I have already been bitten before when a developer productivity startup fails/pivots(as they often seem to do).
That's cool. For what it's worth, the software is all open source, precisely because it's critical in this way. I realize that's like telling you that you can take care of this puppy yourself if you want. :)
Even if you don't move away from IaC, you can still get benefits from the approach by having SI discover the results, and then do analysis.
> You can make a build that includes our trademarks to develop System Initiative software itself. You may not publish or share the build, and you may not use that build to run System Initiative software for any other purpose.
That feels a bit different from what many developers expect when they hear "open source." Nothing wrong with that, just pointing it out.
Sorry maybe my last reply was a little harsh now I understand it isn't a priority IaC under the hood anymore.
I still have major reservations around dropping IaC and just working on a simulation of what is deployed, I don't see how this can work for more complex deployments such as multiple region/AZ deployments, blue/green deployments, cell based deployments etc etc. Seems like dropping IaC would only work for very simple environments.
It works great. If you think of it as 'dropping all the reasons we chose IaC', then yes - that's obviously dumb. If you think of it as 'getting all those benefits, plus faster feedback loops, AI agents, and an easier programming model' then.. not so much.
1. I'm sympathetic to your doubt - I had the exact same doubts, and part of why it's taken us so long to build is we refused to sacrifice what was good about IaC to get "simplicity". If this was ClickOps, they should be allergic to it. But it isn't. Under the hood is a very powerful new primitive - a reactive hypergraph of functions. The UI is there because it's a fast way to compose things together, and to radiate more information than you can get from an editor alone. But everything you do is tracked, it's fully open and extensible by writing code, and it's much easier to use to communicate with your team members who might not know all the intricacies of IaC.
2. There is how it is better today, and there is how it can be better in the future. Focusing on just today - frequently the kind of review that needs to be done is by an external subject matter expert. Being able to bring those people in to a change set, show them what the change you are proposing is, and have them inspect and alter it with you in real time is great. An example here is one of our early users wanted to use ECS, but had never used the service before. So they put things together in SI, asked someone who had that expertise to look it over - they could see the architecture, they could change properties, add a few missing things. It was much more straightforward than a back and forth in a PR.
But that's not to say that, in the future, there isn't more to do. We need to have more functionality around who needs to review things, build more specialized views for the review (there's no reason you should be stuck doing a review only in a single view of the architecture), use the snapshots we have of the entire graph to build more insightful ways of communicating what's changed (and what actions will happen when you apply.)
Think multiplayer and powerful review and approval semantics.
To your first question, adding what we call management components is high on the to-do list. They serve this role, in addition to letting you describe workflow across the other infrastructure they manage.
You add support directly in System Initiative, by writing the schema and functions in TypeScript. You can see how in the docs.
The resources can be periodically refeshed, and the data flows through all your open change sets automatically - kind of like an automatic rebase. This is also an area where we’re doing work - both to figure out the right intervals and to show drift more clearly.
You can't import terraform state right now. High on the road map (https://docs.systeminit.com/roadmap) is discovery of existing resources, starting with the import of individual things, and eventually building the component backwards from discovering the resources. We think that'll be better than importing terraform state, because it will essentially be a rolling reflection of the output, rather than trying to back into it from terraform.
But we're 100% open to importing it from the state file if that's what folks need.
Doing this with resources is going to be the easy path. Translating the state representation of re-usable modules I expect is going to be more difficult but necessary for the migration path to be useful. A lot of power in Terraform is being able to stack re-usable components, say an internal module representing a service on top of a public module that provides sensible defaults like the terraform-aws-modules collection which then sits on top of resources.
If you can get this to a place where an existing deployment of Terraform modules can be imported into SI as a collection of re-usable templates and then users can easily deploy n+1 of those templates the same way they would do with Terraform I think you will have an easy migration path.
If not importing from the state at least some way of automatically importing those re-usable components and allowing the deployment of n+1 of a reusable component would be helpful to migration from existing patterns.
I would be fine with discovering the existing resources actually, if SI can do that. The reason I mentioned Terraform is because my infrastructure is fully managed by it so I thought that would be the easiest path, but if SI can automagically discover all the resources that’d work just fune for us. Thanks, need to give it (SI) a go!
This worked in a previous version of SI (we've built like 4 different versions of it over the years) - but we haven't brough it back in to the current version yet. But we absolutely will!
All marketing is marketing that will land on flat ears. In the end, you'll have to try System Initiative, see if it is a fit for your use case today, and if it isn't, if it's worth paying attention to tomorrow. I wouldn't (and you shouldn't either) make a technology decision based on what anyone says on their website or blog. :)
Today the obvious drawbacks:
* Terraform has tons of coverage in their provider ecosystem, and we're not close to that yet.
* We have some enterprise features still to add.
* There is some work to be done around huge infrastructures, both in how to provide easy ways to visualize them and how we scale the underlying graphs.
We have plans for all these things, but it's early days. My advice (not just for SI) - you should always build representative prototypes if you want to understand what a technology might do for you. Your circumstances matter, and your problems are likely unique.
Prototypes are expensive and as such difficult to justify if the technology doesn't look promising. I'm sure you're aware there's a new self-proclaimed miracle tool appearing in this ecosystem every day. My point is that there is a severe lack of information to make an educated decision here.
I think it's fair to say most people will be interested in potentially replacing Terraform with this. Do you have a comparison against Terraform? Is there a guide on how to import resources into SI?
Having a pros/cons page vis-a-vis alternatives that was honest and thorough is something Hashicorp always did well, to their credit. You should, too. It earns trust.
One of my past employers specifically declined to write up this while trying to sell into a crowded space like SI aims to. In my case I really got the stinkeye for asking about it (how we compare, and do we document that publicly anywhere) during my first few weeks. It definitely left a sour taste for me.
It's actually pretty good - usually the reason it's not accurate is because enough data isn't being fed to the simulator. That's one of the things that was great about doing it in SI - it wasn't hard to get the data in to the simulator.
But if I was AWS, I would also say you should check your IAM against the real world, because if you don't, it's pretty easy to wreck you environment. ;)
For other resources that don't have a simulator provided by AWS (e.g. an EC2 instance) how do you ensure that your simulation is accurate? How do you keep the simulations up to date?
if (authCheck.exitCode === 0) {
return {
result: "success",
message: 'Credentials are Valid'
};
}
return {
result: "failure",
message: 'Credentials are invalid. Please check the credentials set on the secret/credentials prop!'
};
Checking the validity of an STS token requires that you have a bona fide token first. Thus it’s not a “digital twin”; it is the real McCoy. And STS tokens are free.
There is no digital twin I’m aware of that is capable of simulating the real behavior of an EC2 instance. There are just too many variables to consider. To test instance launch and runtime behavior to a meaningful degree of certainty, you have to launch one first. And that means accepting the costs of doing that.
(I notice, too, that you appear to be executing the AWS CLI to do this. I’m not sure if that’s bad or not, but it smells a little fishy.)
We're being intentionally pragmatic here. If you're building a digital twin of, say, an F1 car - the complexity of the simulator has to be very high. It's more like building a mock of physics than just the car.
With Infrasturcture, it turns out that what you need to know is "did I make a valid configuration", or "does this set of things work together". It's less about making a mock of the results, and more about simulating that the results would have the effect you think they will. So we can't tell you "will your application work on this size of instance" (although if you know that, you could encode that!) - but we can tell you if the options your setting are correct, if the AMI exists in the region, etc etc.
It’s terribly slow, given that it’s starting an entire Python process, configuring boto3, etc. that’s 2 seconds on my machine, just to run —help. And it’s all to make a single HTTP request (80ms)