I have been wanting to try Pulumi out for a while. As an avid terraform user, I welcome the ability to use a proper language instead of the declarative HCL. For those who have migrated from terraform -> Pulumi, what are the biggest cons you've experienced?
I witness the same mentality in sbt users, Turing computable > declarative. I don't get it though.
In both the build system and the deploy system, I want to know the config terminates, and I want it to be easy to understand. The application's specialization is Turing computability. I prefer that footgun to stay isolated there. But maybe there is a use case I don't get.
One of the things I really believe is that you can have the best of both worlds here. Pulumi uses imperative programming languages, but is still "declarative". The imperative programs are executed to build up the desired state, which can then be reliably diff'd and previewed, and can be used to enforce manual or automatic checks for correctness. So you get the expressiveness of imperative programs (loops, conditionals, components, packages, versioning, IDE tooling, testing, error checking, etc.), but still the safeguards and reliability of declarative infrastructure-as-code (preview, gated deployments, policy enforcement, etc.).
I also tend to view the perceived benefits of JSON/YAML/HCL "simplicity" as somewhat comparing apples to oranges on a complexity specturm. If you are only managing a dozen resources, it may be that JSON/YAML/HCL are fundamentally simpler. But when you've copy/pasted tens of thousands of lines of YAML around all over your codebase to manage hundreds or thousands of resources, the value of abstraction, reuse, well defined interfaces, and tooling to manage that complexity feels to me essential to the scale of the problem. And that degree of complexity is no longer just something large organizations are dealing with. Modern cloud technologies (serverless, containers, Kubernetes, etc.) are leading to significant increases in the number of cloud resources being managed, and the pace at which those resource are deployed and updated.
Assembly is a "simpler" way to think about programming, but didn't scale as complexity of application software increases. I believe the same is true about JSON/YAML/HCL and cloud infrastructure.
> But when you've copy/pasted tens of thousands of lines of YAML around all over your codebase to manage hundreds or thousands of resources, the value of abstraction, reuse, well defined interfaces, and tooling to manage that complexity feels to me essential to the scale of the problem.
You're lumping in HCL (and languages like Dhall by extension) with static serialization formats and criticizing them for a characteristic only found in the latter.
HCL is programmable and has a fair model for code reusability through modules, state outputs, for-expressions and other kinds of expressions.
Add in a proper language with types like Dhall and you have a configuration language where you can apply all the transformations you could want with a much higher safety and robustness floor than a turing-complete language that allows you to make all sorts of messes.
It's specially dangerous to have a turing-complete language for configuration once you factor in that the reflex of an inexperienced developer who is more likely to make these messes is to use a tool they're already familiar with even when the tool is actively harmful to their goals, as Pulumi facilitates.
We've worked with a lot of end users to migrate from Terraform, and we honestly do see a lot of copy-and-paste. I agree that it's not as rampant as with YAML/JSON, however, in practice we find a lot of folks struggle to share and reuse their Terraform configs for a variety of reasons.
Even though HCL2 introduced some basic "programming" constructs, it's a far cry from the expressiveness of a language like Python. We frequently see not only better reuse but significant reduction in lines of code when migrating. Being able to create a function or class to capture a frequent pattern, easily loop over some data structure (e.g., for every AZ in this region, create a subnet), or even conditionals for specialization (e.g., maybe your production environment is slightly different than development, us-east-1 is different, etc). And linters, test tools, IDEs, etc just work.
For comparison, this Amazon VPC example may be worth checking out:
It's common to see a 10x reduction in LOCs going from CloudFormation to Terraform and a 10x reduction further going from Terraform to Pulumi.
A key importance in how Pulumi works is that everything centers around the declarative goal state. You are shown previews of this (graphically in the CLI, you can serialize that as a plan, you always have full diffs of what the tool is doing and has done. This helps to avoid some of the "danger" of having a turing-complete language. Plus, I prefer having a familiar language with familiar control constructs, rather than learning a proprietary language that the industry generally isn't supporting or aware of (schools teach Python -- they don't teach HCL).
In any case, we appreciate the feedback and discussion -- all great and valid points to be thinking about -- HTH.
> It's common to see a 10x reduction in LOCs going from CloudFormation to Terraform and a 10x reduction further going from Terraform to Pulumi.
I don't see this as such a terrible problem. The configurations may have more LOC's but there are not as many surprises. The dependency of declarable configuration makes it rock solid and favorable among operations teams who need to make these kinds of changes all the time.
> A key importance in how Pulumi works is that everything centers around the declarative goal state. You are shown previews of this (graphically in the CLI, you can serialize that as a plan, you always have full diffs of what the tool is doing and has done. This helps to avoid some of the "danger" of having a turing-complete language. Plus, I prefer having a familiar language with familiar control constructs, rather than learning a proprietary language that the industry generally isn't supporting or aware of (schools teach Python -- they don't teach HCL).
I understand the reason to want this. Having worked closely with developers, lack of familiarity with HCL makes it much less accessible. However, from an operations perspective, I am GLAD that HCL is a very limited language. No imports of libraries all over the place (in your infrastructure configurations, no less!).
> I don't see this as such a terrible problem. The configurations may have more LOC's but there are not as many surprises. The dependency of declarable configuration makes it rock solid and favorable among operations teams who need to make these kinds of changes all the time.
The issue is that your static configs often have lots of boilerplate sections that have to be kept in sync. Further, you can use an imperative language like Python, JS, etc and still write in a completely declarative fashion (or you can use a functional language which tend to be declarative out of the box). Conversely, you can model an AST in YAML (which is what CloudFormation is trending toward) and get the worst of all worlds. Bottom line: don't conflate "reusability" with "imperative" or "static" with "declarative".
> The issue is that your static configs often have lots of boilerplate sections that have to be kept in sync.
Yes, I agree with this. However, its predictable. As an operations person, I value predictability and am willing to pay the price of keeping static configs in sync.
> Further, you can use an imperative language like Python, JS, etc and still write in a completely declarative fashion (or you can use a functional language which tend to be declarative out of the box). Conversely, you can model an AST in YAML (which is what CloudFormation is trending toward) and get the worst of all worlds. Bottom line: don't conflate "reusability" with "imperative" or "static" with "declarative".
Hold on, I'm not conflating anything. Saying that "you can write terrible things in any language" isn't anything new. We choose to use languages that provide certain guarantees that we need for the domain that we're working in. For infrastructure, declarative languages are a lot more suitable for the properties they provide (i.e. no surprises, limited functionality etc.). Its "possible" to use static types in Python, how many do that?
> Yes, I agree with this. However, its predictable. As an operations person, I value predictability and am willing to pay the price of keeping static configs in sync.
I think there's wisdom in this at small scales, but as the volume and complexity of your boilerplate grows, I think you lose any advantages. I also think this threshold is quite low (as an ops person and a dev person) since it's not much harder to look at/read the YAML generated by a script vs that which is hand-rolled and committed to git.
> Hold on, I'm not conflating anything.
Are you sure? Because you just said "I am willing to pay the price of keeping static configs sync" and then "For infrastructure, declarative languages are a lot more suitable for the properties they provide" and then you started to talk about "static types" in Python, which is different than "static" in the YAML sense (YAML isn't statically typed, but it is static in that it isn't evaluated or executed).
I'm not trying to be a jerk, it just sounds like a lot of concepts are being confused. I also wasn't making the argument "you can write terrible things in any language" (not sure if you were attributing that argument to me or if that was a point you were trying to make).
It's fully declarative, but it does evaluate, so it's not static in the YAML sense. It outputs a JSON CloudFormation template (but it could easily output in YAML) which you could inspect visually before passing onto CloudFormation.
It's also statically typed although that's not evident from this file since all types are inferred in this file (however there are annotations in the imported libraries), and while the static typing is a very useful property, it's not what I've been talking about in this thread.
In my opinion, this is no less readable than the equivalent YAML; however, it's capable of doing much more (albeit if your infrastructure is just one S3 bucket, then this is overkill--to really understand the power of dynamic configuration, you would want a more complex example).
I can’t trust my teammates to write code that doesn’t use raw eval()’s all over the place.
Getting them, nevermind relying on them to write Python/JS in the correct way is straight up out of the question.
At least I know in Terraform/HCL they can’t map a config change over the 1000 new instances they spun up because they happened to write their for loop wrong.
> I can’t trust my teammates to write code that doesn’t use raw eval()’s all over the place. Getting them, nevermind relying on them to write Python/JS in the correct way is straight up out of the question.
> At least I know in Terraform/HCL they can’t map a config change over the 1000 new instances they spun up because they happened to write their for loop wrong.
To be clear, the proposal is to use a programming language to generate your HCL-equivalent configs, not to imperatively modify infrastructure. Consequently, you can inspect the generated "HCL" (or whatever the output is) and make sure it looks like the code they would write manually. Further, you can even write automated tests.
So, things need to be comprehensible by the humans that work with them. A 10x reduction in LoC / 10x increase in expressibility may or may not be a good thing, but if it captures intent better and with less ceremony and cruft, then it most decidedly is a FANTASTIC thing. Whereas a 10x LoC improvement that makes it harder to glean intent would be DISASTROUS.
Then again, code has to be run in order to analyze its output -- that or code has to be data you can analyze (like a Lisp), but that can be very difficult to reason about.
So my preference would be to have libraries for constructing configuration data. Then you can execute a program to generate the configuration, and that you can use without further ado. The output may not be easy for a human to understand, though it should be possible to write code to analyze it.
So as a user, can I configure this Pulumi VPC stack before it's instantiated? Or do I have to use the defaults first and then use the CLI to change things? Do these CLI changes then get placed into code, or just into state? Does that mean I'm now in a situation where the code doesn't match the state?
Personally I find the Terraform configuration much easier to reason about, I see exactly where resources are declared just by scanning the file. (But I've also used Terraform a lot).
Edit: Ah, maybe I have to configure it via this config.py file [1]? I appreciate what Pulumi is trying to accomplish, but that is certainly not a config format I'd like to be using. Maybe you could use HCL or YAML for it? ;)
Edit 2: Another last thought, I think a lot of the mindset in Terraform comes from Go, where the proverb "A little copying is better than a little dependency" is pretty well adopted. Before I started writing Go as my main language I didn't appreciate that mindset, but after 5 years with Go I've found it more and more appropriate [2].
You're right, the Pulumi example is a project, not a reusable module. There are a few approaches to making it modular:
1) The project does support config. So if you want to change (e.g.) the number of AZs, you can say
$ pulumi config set numberOfAvailabilityZones 3
$ pulumi up
And Pulumi will compare the current infrastructure with the new goal state, show you the diff, and then let you deploy the minimal set of changes to bring the actual state in line with the new goal state. This works very much like Terraform, CloudFormation, Kubernetes, etc.
2) You can make this into a library using standard language techniques like classes, functions, and packages. These can use a combination of configuration as well as parameterization. If you wrote it in Python, you can publish it on PyPI, or JavaScript on NPM, or Go on GitHub -- or something like JFrog Artifactory for any of them. This makes it easy to share it with the community or within your team.
3) We offer some libraries of our own, like this one: https://github.com/pulumi/pulumi-awsx/tree/master/nodejs/aws.... That includes an abstraction that's a lot like the Terraform module you've shown, and cuts down even further on LOC to spin up a properly configured VPC.
I am a big Go fan too, so I very much know what you're saying. (In fact, we implemented Pulumi in Go.) Even with Go, though, you've got funcs, structs, loops, and solid basics. Simply having those goes a long way -- as well as great supporting tools -- and you definitely do not need to go overboard with abstraction to get a ton of benefit right out of the gate.
"The project does support config. So if you want to change (e.g.) the number of AZs, you can say..."
Cool, is it possible to do that without having to use the CLI? Are you doing any sort of state locking here? I've seen ops teams get saved from potentially horrible situations by Terraform's dynamodb state locking.
"You can make this into a library using standard language techniques like classes, functions, and packages."
That's pretty nice and it seems like it'll get you the same functionality as a Terraform module. Do you have any plans of releasing something like the Terraform Registry to help with discoverability?
Also, do you have any docs on writing providers? I've had to do that a few times for Terraform and getting up and running with that was pretty easy as a Go developer. I wouldn't really want to do that for every supported language though (no offense C#).
I'm seeing that some of this is using codegen to read the equivalent Terraform provider and generate the Pulumi provider from that schema. Is that the preferred workflow here for providers that already exist in the Terraform ecosystem?
> is it possible to do that without having to use the CLI? Are you doing any sort of state locking here?
Yeah it's just a file if you prefer to edit it. By default, Pulumi uses our hosted service so you don't need to think about state or locking. That said, if you don't want to use that, you can manage state on your own[1]. At this time, you also need to come up with a locking strategy. Most of our end users pick the hosted service -- it's just super easy to get going with.
> Do you have any plans of releasing something like the Terraform Registry to help with discoverability?
I expect us to do that eventually, absolutely. For us it'll be more of an "index" of other package managers since you already have NPM and PyPI, etc. But definitely get that it's helpful to find all of this in one place -- as well as knowing which ones we bless and support.
> Also, do you have any docs on writing providers?
We have boilerplate repos that help you get started:
These packages are inherently multi-language and our code-generator library will generate the various JavaScript, Python, Go, C#, etc, client libraries after you've authored the central Go-based provider schema.
> Is that the preferred workflow here for providers that already exist in the Terraform ecosystem?
Yes. We already have a few dozen published (check the https://github.com/pulumi org when in question). In general, we will support any Terraform-backed provider, so if you have one that's missing that you'd like help with, just let us know. We have a Slack[2] where the team hangs out if you want to chat with us or the community.
I would point out that Dhall solves these problems with one simple fundamental construct: the function.
And still keep it terminating.
You can do it but it means doing more cognitive engineering than "just throw python at it".
Another point: you can have a declarative turing complete language. I would really like to see people bring prolog like languages to things like pulumi and terraform.
That would also allow to get convergent concurrent application which means we could get proper collaboration. That would be a strong move ahead for devops.
> We've worked with a lot of end users to migrate from Terraform, and we honestly do see a lot of copy-and-paste. I agree that it's not as rampant as with YAML/JSON, however, in practice we find a lot of folks struggle to share and reuse their Terraform configs for a variety of reasons.
I would risk to say that it’s not the Terraform that makes the people to copy / paste. It’s the people. Call it lack of knowledge, not enough time, laziness, tight schedules...
Once your customers are on their own, new people join - no knowledge of Pulumi, resources get added / moved / evolve, there will be copy / paste in their Pulumi code too.
Not defending Terraform here. Just adding a point to the discussion.
Some of this is truly on terraform. The for construct (and looping in general) was only added in TF 12, released in May 2019. Older codebases didn't have a real way to support looping so there's more copy paste there. TF supports ternary conditionals, but not true if statements, which makes adding more complicated if logic difficult.
The reality is that all programming languages have significant copy paste codebases using them, but there are features which help reduce the amount of it. Terraform is missing some of those features, and many of the features it does have were introduced in tf 12, which is less than a year old.
Yes. But Terraform (hcl) is not a programming language.
It’s interesting that some people bring up sbt as an example of how to use a „programming language” for configuration. The reason why sbt became dominant was the weight of Lightbend (Typesafe). There was no way to get away from it. Frankly, sbt can be awful mashup of copy / paste too. sbt is so much magic, I would not be surprised to discover that majority the folks who use sbt, have no actual clue why stuff works the way it works.
I haven’t tried Pulumi yet, I will try when I get the chance. I am eagerly waiting for an opportunity to use it. Hopefully it will surprise me in a positive way. Surely, it can deliver on what it promises. I have very fond memories of Chef and cookbooks in Ruby, it can be done.
Edit: personally, Chef solo (with right tooling to eliminate the server), was the best experience so far. If Pulumi can improve on that (no agent), I’m looking forward to take it for a test drive.
> I would risk to say that it’s not the Terraform that makes the people to copy / paste. It’s the people. Call it lack of knowledge, not enough time, laziness, tight schedules...
Well, the problem is that a majority of people don't want to / don't have the time to learn HCL, because it's not the most effective use of their time / not worth the "investment" to do so.
Learning HCL is not very rewarding, unless you are an ops person.
Learning a general purpose language language like Python, TypeScript or whatever language your company uses is rewarding both for ops and dev people (or devops people if you like that term) and typically can be used for a much wider set of use-cases.
When introducing a new language the pros and cons of doing so should always be carefully considered, however unfortunately for devops tools new languages like HCL,Jsonnet,Starlark,zillions of YAML pseudo-programming DSLs etc. are often introduced very lightly, mentioning a handful of use cases where the new language shines, but ignoring the cons and intrinsic costs (learning curve, new tools, editor integrations, package manager etc. to be built).
Terraform works great for teams where you have a strict separation between ops and dev people. The ops people will spend their time learning HCL, the dev people will learn Python, TypeScript or whatever that is.
However if you are trying to truly embrace a "DevOps" model Terraform shows its flaws. Developers will either still heavily rely on ops people to "help them" even for trivial infra changes or they will write sub-par copy pasta HCL code that tends to be verbose.
TF 0.12 may have a bunch of new constructs which make it easier to reduce duplication, but the boilerplate that is required to create an actual reuse module with variables and import it (and overall awkwardness of the module system/syntax compared to any other language) vs the simplicity of creating a reuse function/file in Python/TS is like night and day.
Furthermore the subpar editor support for TF makes it actually hard to follow references between modules and safely refactor code, so there is a much lower threshold at which an abstraction appears "magic"/incomprehensible in HCL, compared to typed TS/Python where you can easily follow references.
Source: ~2 years worth of Terraform (incl. 0.12) and ~1 years worth of Pulumi use within multiple companies and teams.
Looking at your Terraform and Python scripts I see two different scripts doing different things with different abstraction levels and different configuration toggles.
It's ironic that you sell as a plus that Python allows you to easily loop over data structures and make resources codnitional, because pretty much all your Terraform resources there are conditional (with a few looping over lists for DRY purposes), while few of the Python resources are.
Many of the lines saved for declaring identical resource types are just because either the Terraform resource is declared with unnecessary values or because the Python one has a default value, which can be provided as well in Terraform.
But yeah, the bulk of the difference is that the scripts are doing different things by declaring different sets of resources.
> Plus, I prefer having a familiar language with familiar control constructs, rather than learning a proprietary language that the industry generally isn't supporting or aware of (schools teach Python -- they don't teach HCL).
Which comes back to my point about inexperienced (or the "10x" ones that cut corners until the table is round and then leave) developers preferring familiarity over using a specialized tool that takes into account common pain points, further fragmenting the space through "worse is better". I am certain I will die employed on cleaning up ORM messes left by developers that didn't want to learn SQL despite having a whole field of mathematics backing it; so if you're successful, odds are I will also end up fixing some day the "declarative output" a Pulumi script produced in a developers computer that is not reproducible anywhere else because it makes a request to his home server and mutates an array of resources somewhere depending on that response, the current time, the system locale and the latest tweet by Donald Trump.
"Many of the lines saved for declaring identical resource types..."
Yeah, it seems a bit silly to say that a benefit is saved lines of code, yet the Terraform example is setup to do quite a lot more than the Pulumi example. The resources are just there and turned off with the "count" configuration. The Pulumi example isn't doing any of the RDS, Redshift, Elasticache, Database ACL, VPN gateway, etc things. This example is a pretty substantial module and I'd guess the LOC would be pretty similar between the two if the functionality were closer.
> It's specially dangerous to have a turing-complete language for configuration once you factor in that the reflex of an inexperienced developer who is more likely to make these messes is to use a tool they're already familiar with even when the tool is actively harmful to their goals, as Pulumi facilitates.
"Turing complete" is a red herring. You can write a program in Dhall that will continue to run long after we're all dead. But this doesn't happen in practice and/or when it does we notice something is wrong fairly quickly and correct the problem. And because these infra-as-code-and-not-configuration solutions generate configuration, if you do have a loop that doesn't terminate or similar, it's not a problem because your program never deploys any changes.
As for making messes, our experienced developers make more of a mess with static configuration because it's fundamentally impossible to manage large static configurations with their inherent repeatable segments that must be kept in sync. The static configuration players try to solve for this by introducing hacky mechanisms for reuse (macros and nested-stacks in CloudFormation, text templates via Helm for Kubernetes, etc), but these fall over very quickly as hacks do.
> "Turing complete" is a red herring. You can write a program in Dhall that will continue to run long after we're all dead.
It's not the avoidance of the halting problem the reason these languages are better for the task. It's the benefit of having limitations that come with being turing incomplete that prevent us from doing a lot of stupid stuff without realizing it and doing "hacky workarounds" without properly understanding the problem we face.
> As for making messes, our experienced developers make more of a mess with static configuration because it's fundamentally impossible to manage large static configurations with their inherent repeatable segments that must be kept in sync.
Or don't do static configuration and just use something like Terraform where you can just reference a resource and pass it around.
> It's the benefit of having limitations that come with being turing incomplete that prevent us from doing a lot of stupid stuff without realizing it and doing "hacky workarounds" without properly understanding the problem we face.
You'll have to articulate your said benefits to be sure, but I would wager that the principle reason to be turing incomplete is to address the halting problem and that the benefits you're thinking about come from other properties of the language (functional purity, immutability, limitations on I/O, type safety where applicable, etc).
Notably, there are lots of hacky workarounds employed in HCL and YAML because people don't understand the problem properly. The problem requires that we can generate arbitrary static configuration from a fixed set of inputs. If your organization is so inept that they keep adding in infinite loops and/or I/O, then by all means, try something like Dhall or Starlark (unfamiliar vs not-type-safe, pick your poison); however, if this is a consistent problem in your organization you probably need to replace your humans because these programs aren't hard to write correctly.
> Or don't do static configuration and just use something like Terraform where you can just reference a resource and pass it around.
Because this only addresses reuse at the resource level. You can do the same thing in CloudFormation; it's not adequate. For example, not everything is a resource. You ultimately need the ability to generate arbitrary static configuration. Terraform probably has lots of other disparate features that collectively address a good portion of the solution space, but programming languages have a unified concept ("functions") that satisfy the whole solution space and programmers are already familiar with them. Terraform's job should be taking static configs and applying them to infrastructure--let a real programming language generate those configs, or at least offer dynamic configuration language that is designed with a proper understanding of the problem (to use your words).
I do not doubt Pulumi is more expressive, but that's also my point. It will be interesting to see how it works out. Sbt won Scala mindshare, there is definitely a strong fanbase for expressivity.
None of my Terraform projects are 10k lines long. I find it's reusable and at almost the right level of abstraction (Typed templates). I tend to go for a minimum expressivity necessary for DRY. So far I have not found Terraform lacking for a single project, but I have found it lacking for expressing higher order infrastructure (infra code intended for multiple projects).
I've never managed a project with thousands of hetrogeneous resources though. I question whether that's really a thing that a single team would do.
From the first look of it will still prefer Guile and Guix deploy or Nix with Nixops. All these systems (like pulumi, terraform, ansible and many more) are not really new or innovative, it’s just re-inventing the same wheel with different names and jumping between declarative and imperative syntax.
Guix and Nix both are innovative way to build production, reproducible, secure deployments and platforms without side effects and get rollback and transactions free.
Not only that, but even with a Turing-non-complete language that lets you do useful things (think of Dhall or whatever it's called), chances are pretty good that you can still take forever to terminate if you really try -- you can provably terminate and you can provably not terminate in a lifetime.
Granted, no one is going to really do that. And there are good reasons to want provably-terminating programs (e.g., in DTrace, eBPF, ..., because probe actions have to not just terminate, but also run very fast). But for infrastructure deployment? I think Turing complete is fine for that.
One idea I've entertained is to use jq as a configuration language and have its output be a JSON text describing a fully-constructed configuration. Yes, jq is Turing complete, but it's so damned convenient!
One my of cases was that I just wanted to extend the workflow of my terraform declaration and add a couple of log statements here and there. I didn't want to write Go to write a plugin for that.
With Pulumi - i was able to just sprinkle in a couple of console.log statements and my "extension" was done.
Agreed, I don't really see the upside of getting a "proper" programming language for this.
I just got done with a client of mine migrating a considerable amount of infrastructure code that used Troposphere (Python -> CloudFormation) into Terraform 0.12. The old code was very difficult to reason about, had all sorts of inter-related components that caused infrastructure changes to be scary, and really there seemed to be no benefit of being able to write Python for it.
They now have over 100 services in Terraform 0.12 and they feel confident even making network routing changes in prod during the middle of the day with the new system (something that was unheard of before).
I've found it much easier to write declarative configs in Terraform over the years. Back when I was at Engine Yard we used a project called fog to write infrastructure code in Ruby. It was nice for the time (must have been 2012 or so). But again, I really don't think the problem is that I need a programming language to define my infrastructure. I need a way to declare my infrastructure and manage that state so that I know that what I declare is what is running.
I totally can see teams choosing to write infrastructure code in TypeScript, and Go, and Python. And now you have a mess.
The idea is reuse. YAML (and probably Terraform, though I can't speak to it directly) doesn't give you many facilities for reusing blocks of config, especially if they vary subtly in some parameterized way. CloudFormation gives you some reusability in the way of nested-stacks and macros, but it's seriously heavy-handed.
We do use Troposphere in a handful of cases, and it has its own problems, mostly in that it makes it hard to write declarative Python code with it (which is generally what you want--declarative code but with more expressive power than YAML). I have a prototype of an improvement to Troposphere that I built for my own amusement, and I think I'm on to something:
Note that this example is type-safe and declarative while Troposphere is not.
Basically I don't think Troposphere is a good representation of what infrastructure-as-code(-not-yaml) could look like. Not sure about Pulumi as I haven't tried it. But I know the answer isn't YAML, it's not hacking an AST on top of YAML a la CloudFormation, it's not a different static dialect with its own dynamic hacks a la HCL, and it's not generating YAML with text templates a la Helm.
That's what Terraform modules are for. If I need to bundle up things for, say, multiple environments, I can just write it as a module and configure it through the variables I've exposed.
I can't speak to Terraform because I haven't used it, but I'm skeptical that modules are sufficient for many of the same reasons that CloudFormation templates are insufficient. I think at some level you need the ability to programmatically generate arbitrary static configuration, and you end up needing something very like a real programming language to do that.
If you need something very like a real programming language, you should just use a real programming language instead of a tool that accidentally reinvents programming language concepts and the corresponding unnecessary learning curve that that implies for developers. I think the reason these tools are surviving in the "marketplace" is because they market is inexperienced and they conflate "generating static configs (to be passed into a tool that can apply the configs) with an imperative language" with "imperative provisioning of infrastructure". The market sees static/declarative solutions like Terraform and CloudFormation as the only alternatives to "imperative provisioning of infrastructure".
I think as our industry gets more experience and tools for generating YAML/static-configs improve, it will be clear that these are the ways forward.
"I can't speak to Terraform because I haven't used it"
Well, then maybe you should try it before you decide it doesn't work! I'll definitely be giving Pulumi a try, but honestly Terraform works really well and I'm happy using it.
It's a nice idea to be able to have a programming language you're familiar with available to do infra work, but deploying and managing infrastructure is a totally different problem domain than writing an application. If the concern is letting developers that don't have much ops experience architect an infrastructure, or you have recent graduates that "don't know HCL", then you have quite a higher learning curve than you actually think you do.
I've never once thought "I really wish I could just use TypeScript for configuring my infrastructure". If that's you, great! Call me when you hit scale and your ops team needs to refactor it.
> Well, then maybe you should try it before you decide it doesn't work!
Based on the assumptions I called out about it, it necessarily can't work as well as a general purpose language. I intend to try it to see if my assumptions are correct or not.
> It's a nice idea to be able to have a programming language you're familiar with available to do infra work, but deploying and managing infrastructure is a totally different problem domain than writing an application.
It's not a hard domain. Developers can write a program that evaluates to YAML--it's not hard. The hard part is applying that YAML to the infrastructure in an efficient way, but programmers don't need to worry about this because Terraform, CloudFormation, Kubectl, etc do it for them.
> If the concern is letting developers that don't have much ops experience architect an infrastructure, or you have recent graduates that "don't know HCL", then you have quite a higher learning curve than you actually think you do.
"Learn Ops + HCL" is a bigger learning curve than "learn ops". Notably, learning two things at the same time is a bigger problem than learning them both individually. But you're right that using a standard programming language isn't a substitute for learning how to architect infrastructure--I never said it did--only that it removes the unnecessary complexity and unfamiliarity imposed by HCL.
> I've never once thought "I really wish I could just use TypeScript for configuring my infrastructure". If that's you, great! Call me when you hit scale and your ops team needs to refactor it.
Ha! If the industry hasn't moved past Terraform in 5 years time, it will only be because HCL has adopted enough of the general purpose programming language featureset as to remain competitive. That's certainly its trajectory.
Are you really arguing that deploying and managing infrastructure is not a hard domain? Please. The very existence of tools trying to solve these problems speaks to this difficulty.
I'm going to drop it at this point because it's starting to feel like you're either trolling or have never run a substantial application in production before.
It seems like you’re violently agreeing with me. Like I said, rectifying the infrastructure is the hard part and generating YAML is not. Let the tools do the hard part. They’re good at that. Not so much at the expressiveness and reusability part.
> Ha! If the industry hasn't moved past Terraform in 5 years time, it will only be because HCL has adopted enough of the general purpose programming language featureset as to remain competitive. That's certainly its trajectory.
Quite a statement to make for somebody who has never used Terraform.
Yeah, I meant that Terraform is more rigid with respect to reusability than a general purpose language like Python. This isn't a controversial point; the controversy is whether that's a bug or a feature. I think it's a feature, and to the extent that Terraform is becoming increasingly flexible, I would say I'm vindicated in my position.
I think the person above is mostly pointing out about dynamic blocks , since that do allow abstracting blocks of config and also allow subtle differences
want to point that ever since Terraform 0.12, all terraform params and modules etc are transformable in objects, what I mean by that is you can use functions like yamldecode or jsondecode to decode any yaml or json into terraform objects and pass into modules or resources or locals, also "real" programming language is always funny to me when ops are going to be much more comfortable with HCL. JSON/YAML are not in the same class as HCL. YAML maybe can do nesting with & but they can't just create simple config style interface, where the user facing side is a simple yaml file, and the terraform code underneath take it in and pass it to its modules
> want to point that ever since Terraform 0.12, all terraform params and modules etc are transformable in objects, what I mean by that is you can use functions like yamldecode or jsondecode to decode any yaml or json into terraform objects and pass into modules or resources or locals
This sounds like an important development for Terraform, but it also sounds like it's proving my theory that TF is reinventing a standard programming language badly.
> also "real" programming language is always funny to me when ops are going to be much more comfortable with HCL.
Maybe (and I think it's a big maybe), but if you want to empower developers to do their own ops (read "DevOps"), then HCL is a strict loss over a language they're already familiar with.
> JSON/YAML are not in the same class as HCL. YAML maybe can do nesting with & but they can't just create simple config style interface, where the user facing side is a simple yaml file, and the terraform code underneath take it in and pass it to its modules
Right, as previously mentioned, HCL is accidentally reinventing a programming language. The YAML-based infra-as-code solutions also reinvent programming languages but they build them by building out programming language features on top of YAML (instead of extending the language layer itself) or by generating YAML via text templates. If I had to choose between these, I would pick HCL for sure, but thankfully I can just generate static configs with Python or similar.
One use case that TF really frustrates me with is anything conditional: if I want to do blue/green the best I can do is duplicate the entire infrastructure and wrap my terraform invocations with some brittle script that repeatedly runs TF with various targets. The script only allows me to define the target state, but it is quite common that I care how that final state is reached.
Create-if-not-exists is also really poor in TF (probably by design), if you want to reuse your TF configs for different environments in the same account, you either have to ensure everything won't collide name-wise, or split your TF into immutable infrastructure and really-immutable infrastructure.
The idea of having to use a 3rd party tool to generate config for my 3rd party tool that generates config just seems Wrong.
At this point I'd rather write idiomatic but repetitive Terraform and know that I will be safer on upgrades, as long as I continue to refactor and clean up my modules for the new functionality Hashi is releasing.
The model is "use a programming language to generate static configs", so you can't provision infinite VMs--your program would OOM or run out of disk because you're trying to generate an infinitely-large config file. If you're seriously concerned about this (and you shouldn't be), you can use something like Starlark (https://go.starlark.net), which is a Python dialect that prohibits unbounded loops, I/O, etc. Note that if you use Python with type hints, you can actually get more safety than YAML.
You've already lost big if you ever need a debugger to understand code. Especially if your beginners have to use it just to wrap their head around wtf is happening.
And having to run your "config files" just to see what they're doing is the huge downside of using programming languages as config. In other words, it's kind of circular to point out that using code for config puts you in a situation where you might have to use a debugger to know what your config does.
You’ve never told (or been) a new employee to step through the code to figure out what isn’t in the docs?
There’s a middle ground that you see with a number of test frameworks, Gulp, SCons and I suspect with Pulumi, where the outer shell of imperative code is in practice declarative, and the inner bits are properly imperative.
Part of the code says why to do something, and the rest says how to do it. For instance I don’t see a huge conceptual difference between disabling or filtering a test, versus not deploying a service because it is running, current, and healthy.
If you proxy to replace it might not have to stay that way. But the Lava Flow Antipattern is what happens when momentum fails before the work is complete.
No built-in remote state backend storage (s3, gcs, etc) locking except their proprietary hosting. Obviously they need to make money, but this makes driving migration from terraform to pulumi a lot more difficult.
I've got a PR opened to implement this but due to S3 CAP limitations, it hasn't been merged. The current state is that it could be merged for non-S3 backends and a DynamoDB type lock could be added for S3 backends. I don't have much interest to keep pushing it forward (its been nearly a year of "ill get back to you next week") although their new VP has been more responsive.
Alternately I was considering just implementing the server API to do state storage & locking, but we have implemented some workarounds that are good enough for now.
I just want to say thanks for your initiative on opening https://github.com/pulumi/pulumi/pull/2697. (I've seen you active on a few other issues and in the Slack channel too)
I think it's really unfortunate (whatever the reason) that the team was this slow to provide meaningful feedback on the PR.
Yes, it's nice to write it in code, but you can achieve similar with raw Terraform and there are more documented examples for that, plus if it goes wrong, you can find others facing the same issues, usually with resolutions/workarounds etc.
I wanted to find Pulumi more useful than I really did.
I've used Terraform quite a bit (though less in the past year), and the things that ended up being the biggest headaches for me were:
1. State quickly gets unwieldy and terraform plans get very slow. I think the official recommendation is to split it up to separate instances using separate state files, but that adds complexity of finding the right seams and sharing any needed data between them.
2. Desperately needs a good way to do singletons. Many modules end up fitting the pattern that you need 1 each of some underlying resources (iam role, iam policy, s3 bucket, kms key, whatever) that could be shared by every instance of the thing you're creating. But you have to decide between keeping the code clear and organized and putting those things in the module next to the resources that rely on them, but then duplicating these things N times unnecessarily (which adds to the problem of bloated, slow state). Or you keep the shared resource outside of the module, which makes the code less clear, bloats the global namespace/tf files and requires passing around even more state variables to the module.
3. Tools for abstractions and structuring the app are lacking in general in other ways. Modules are the only thing you have and they were intended to be standalone for sharing code externally and are clunky when using them only within your own app to organize code. There's no way to import shared constants or even group related variables together into something like structs, so you end up needing to pass around the same handful of individual variables into almost every module and resource you create.
4. Your whole terraform config is only as stable as the worst implemented resources. I think this problem is even worse for smaller, community support-only providers but even within AWS, it's not uncommon to come across bugs in how different services are integrated into terraform. Sometimes useful parameters are missing, or terraform doesn't validate them the same way as AWS itself causing something in your apply to fail halfway through, or things are implemented such that the subsequent tf plan still doesn't show up clean after a successful apply. TF has gotten more stable over time, but there still seems to be an endless long-tail of leaky abstraction type issues like these.
This turned into kind of a long list of gripes, but I'd be very curious to hear from anyone who has used both Pulumi and Terraform in production how Pulumi compares on these painpoints in particular.
Start with one per environment. As it grows, you can identify the reusable patterns and move those to their own state files and import them wholesale with terraform_remote_state. It's code, treat it like code.
> 2.
A module for shared resources, a module for resources that use them. You don't bloat anything if you just namespace it and then pass the shared resources' output as an entire object to a parameter in the resources that share them.
> 3.
Same answer. You can create objects for your outputs where you can easily namespace things.
> 4.
This is true. But there's no better way for obvious reasons. It's a high level abstraction, the only alternative is to use low level abstractions, at which point you're just doing shell scripts. Which, by the way, Terraform has pretty clever ways to integrate. It's not pretty, but it's isolated ugliness much like unsafe Rust/C# where you really need to instead of having to go full ugly.
Also, you can always submit pull requests to Terraform providers. Doesn't guarantee the maintainers will be active enough to respond and upstream it, but it will solve the issue cleanly for you and other people at least in the meantime.