ryanryke's comments

ryanryke · 2025-08-27T23:58:29 1756339109

> Sure, your system may update itself more frequently than only when I run "tofu plan/apply" but at end of the day, it doesn't matter.

Correct me if I'm wrong here. In my experience you have to "apply" before state is updated. This would mean we weren't quite operating on the source of truth. (aws in this case).

100% it's a solvable problem with a TF centric tool chain. But it's still a problem that needs solving.

In my experience with SI it fades to the background. Now, I'm sure there is an edge case where someone edits something outside of SI while I'm trying to simultaneously update it in SI where things might break. I haven't run into it yet.

> All I'm saying as SRE, you have done poor job selling this to me

Can't argue this, but I would say like any other new tool, it's worth checking out. :)

stackskipton · 2025-08-28T01:39:15 1756345155

Yes, at apply stage, the state is updated. All the state is useful for is finding the resource for big 3. In fact, I'd argue for TF, they could do away with state file beyond resource "s3_bucket" "thebucket" -> arn:aws:s3:us-east-2:000:0123455 since they pull down the current state of system as is and then show you the "This is what you want and with current state, this is what it will change."

> I would say like any other new tool, it's worth checking out. :)

I don't see the need for a couple of reasons:

1) How? If you want me to try something, either big "TRY ME" unless it involves becoming a client which that case, I see you as replacing me so my motivation is zero. :D

2) I'm on Azure for most part so it's useless anyways.

3) You have not shown me how SI is that much better than Terraform. If I'm going to invest time over yelling at Kubernetes, I need to know my time is worth it.

At the end of the day, we all want the same. Here is defined infrastructure, be it YAML, JSON, HCL, some GUI, API call to a system, Whatever AI is smoking. Ability to see what changes and make those changes. HCL/ToFu is what most of have picked because it's pretty open and widely supported across all the providers. You have to overcome all that. This blog post reads, we have this great new Windows Server thing that will blow your Linux Server stuff away completely with GUI and Siri.

Maybe that's what your customer base needs. However, at technology companies I work at, we don't need that. People editing outside IaC is done very slowly, deliberately and almost always backported. If not, you will get called out. It would like Dev writing code and no tests.

ryanryke · 2025-08-28T13:52:16 1756389136

Thanks for responding. I love these conversations.

>1) How? If you want me to try something, either big "TRY ME" unless it involves becoming a client which that case, I see you as replacing me so my motivation is zero. :D

To be clear, I don't see this as having the capability to replace engineers. This is a new way to interact with your infrastructure.

But it also feeds into larger AI conversations as well. (probably out of scope for this conversation :) )

> This blog post reads, we have this great new Windows Server thing that will blow your Linux Server stuff away completely with GUI and Siri.

Now you crossed a line :D

Jokes aside, there is a HUGE community in the TF and now OT space, you can't argue that. The ecosystem of third-party tools to help support workflows is gigantic. Putting all of that aside, will that be the best way forever? I'm not saying that SI will replace either one of those, but I would say it's a new and refreshing way to tackle a similar problem space.

>2) I'm on Azure Sorry ;-P

ryanryke · 2025-08-27T18:46:33 1756320393

Thanks for the feedback. I'm new to the platform, and certainly appreciate the interaction.

I think I described SI a bit better in another reply, and you can certainly check their website for a better description than I can give here.

I'll try to high level our particular issues to give you a sense of why this is important to us.

Traditionally, we've managed our customers via TF. I made a big push years back to try and standardize how we delivered infrastructure to our customers. We started pushing module libraries, abstract variables via yaml, and leveraged terra grunt to try and be as dry as possible. We followed along best practices to try and minimize state files for reduced blast radius etc.

What became apparent was that despite how much we tried to standardize there was always something that didn't fit between customers. So quickly each customer became a snowflake. It would have its own special version of some module or some specialized logic to match their workflow. Then over time as the modules evolved, so the questions start to come up:

- Do we go back and update every customer with the new version of the module? - Does the new module have different provider/submodule/tf version requirements? - Did the customer make some other changes to infra that aren't captured?

Making minor changes could end up taking way longer than necessary. Making large changes could be a nightmare.

In working with SI the mindset has shifted. Rather than manage the hypothetical (ie what's written in TF), let's manage the actual. Trying to reconcile in code why a container has 2cpus instead of 4, find the issue and fix it. If want to upgrade something, find it and upgrade it.

I can go into greater depth if you care or have questions, but this at a high level explains this post a bit more.

ryanryke · 2025-08-27T17:40:31 1756316431

Thanks for the feedback. My plan is to spend a little more time to dive into the details on a follow up post.

I'll try to explain our experience here in a little better detail though.

In a traditional IAC tool (tf for example). The flow would go something like this (YMMV)

Update TF -> Plan -> PR -> Review (auto or peer) -> Merge -> TF Reviews State File -> TF Makes changes -> Updates State.

Some issues we could run into: - We support multiple customers each with their own teams that may or may not have updated infra so drift is always present.

- We support customers over time so modules and versions age, and we aren't always given the time to go make sure that past tf is updated. So version pins need to be updated among other dependencies.

Each of those could take a bit of time to resolve so that the tf plans clean and our updates are applied. Of course there are tools such as HCP Cloud, Spacelift, Terrateam etc. But, in my experience it shifts a lot of the same problems to different parts of the workflow.

The work flow with SI is closer to the following: Ask AI for a change -> AI builds a changeset (PR) -> Review -> Apply

The secret sauce is SI's "digital twin". We aren't just using AI to update code, we're actually using it to initiate changes to AWS via SI. While I would never want to have a team make changes directly to AWS without a peer review or something similar, it is sitting closer to what the actual infrastructure is. Even with changes that are happening to the infrastructure naturally.

This has allowed us to move quite a bit faster in updating and maintaining our customers infrastructure. While still sticking as close as possible to best practices.

stackskipton · 2025-08-27T17:48:08 1756316888

So basically the product is "Custom IaC with AI agent" Sounds like a great business model if you can convince companies to go for it.

However, as SRE, pass. I'd rather keep IaC in one of our pre existing tools which much wider support and less lock in. Also, since I'm in Azure/GCP, this tool won't work for me anyways since it's AWS focused and when you go multi cloud, the difficulty ramps up pretty quickly.

ryanryke · 2025-08-27T18:16:48 1756318608

Essentially. I'm not sure you could call it IAC specifically, but the same ideas apply.

Regarding lock in: I don't necessarily think there is anything here that is stopping you from writing TF and importing objects. Conversely, SI is great for importing resources into their model.

So the objects are essentially modeled in type script on the back end so support for other vendors is available. It's just whether or not they are created yet. I'll let the SI folks dive into details there.

holoway · 2025-08-27T18:05:10 1756317910

It's absolutely AWS focused today - but one upside of the approach is that building the models is straightforward, because we can build a pipeline that goes from the upstream's specification, augments it with documentation and validation, etc. We'll certainly be expanding coverage.

AOE9 · 2025-08-27T18:03:05 1756317785

I think as you are a professional services company that imposes a certain workflow on you. For regular software engineering you'd just make the IaC/code deployable from the developers machine and or on a pull request take the branches code, deploy it and post back a link to the PR.

ryanryke · 2025-08-27T16:49:28 1756313368

We're really excited about what the future holds with SI. Feel free to ask any questions.

tietjens · 2025-08-27T17:40:37 1756316437

I have been on a small journey to try to understand what SI is. I’ve read your blog posts, listened to the Changelog show with the CEO, watched some demos and joined the Discord. But I still don’t understand what a 1:1 digital twin means. You are mirroring AWS’s api? Can you help me grok what 1:1 means concretely?

holoway · 2025-08-27T18:00:34 1756317634

You should check out the site again today - I think it will help at least at a high level of what it's like to use System Initiative today.

We didn't recreate the AWS API. Rather than think about it as the API calls, imagine it this way. You have a real resource, say an EC2 instance. It has tons of properties, like 'ImageId', 'InstanceType', or 'InstanceId'. Over the lifetime of that EC2 instance, some of those properties might change, usually because someone takes action on that instance - say to start, stop, or restart it. That gets reflected in the 'state' of the resource. If that resource changes, you can look at the state of it and update the resource (in what is a very straightforward operation most of the time.)

The 'digital twin' (what we call a component) is taking that exact same representation that AWS has, and making a mirror of it. Imagine it like a linked copy. Now, on that copy, you can set properties, propose actions, validate your input, apply your policy, etc. You can compare it to the (constantly evolving, perhaps) state of the real resource.

So we track the changes you make to the component, make sure they make sense, and then let you review everything you (or an AI agent) are proposing. Then when it comes time to actually apply those changes to the world, we do that for you directly.

A few other upsides of this approach. One is that we don't care how a change happens. If you change something outside of System Initiative, that's fine - the resource can update, and then you can look at the delta and decide if it's beneficial or not. Because we track changes over time, we can do things like replay those changes into open change sets - basically making sure any proposed changes you are making are always up to date with the real world.

ryanryke · 2025-08-27T17:57:43 1756317463

Feel free to reach out and I can show you.

The way I think about it is like this:

We want a representation that is as close as possible to what actually is in AWS. That way any proposed changes have a high probability of success when they are applied. SI's approach keeps an extremely up to date representation of what's in AWS.

Why do we need a representation and not just go directly to the AWS API? Among other items, it removes the capability of reviewing changes before they are applied. It gives us a safety net if you will.

tietjens · 2025-08-27T18:25:56 1756319156

Is this representation made available to SI users? Do I have clear overview of it? I've accepted that it isn't api calls.

holoway · 2025-08-27T19:03:01 1756321381

Yeah, in all sorts of ways. You can look at it in a Grid of components. You can look at it in a Map, seeing all the relationships. You can look at it via an API. You can have an AI Agent summarize it for you. It's super transparent.