Monorepos and Forced Migrations

adrianmonk · on Oct 13, 2021

The article didn't mention the big advantage (in my mind) of forced migrations: the migration actually gets done.

I've worked at places that have done it both ways (forced migrations vs. ways to do migrations later when convenient), and when you can do the migration later, it often ends up being much later, sometimes infinitely later (i.e. never).

Do you want all the pain of migrating now? Or do you want to spread the pain over a longer period of time?

If the breakage is due to incompatibility (the API behaves differently, etc.), then breakage that the migration caused is probably still going to happen if you do the migration later. So you're not reducing the total amount of breakage/pain.

And if you spread it over a longer period of time, you are increasing the maintenance burden on the team that needs to provide compatibility with the old and new ways.

lmm · on Oct 13, 2021

Probably the technically best client I worked for had a 6 month rolling deprecation cycle. Every library must release at least every 6 months (or, in the case of an external library that hasn't made a release, the dedicated internal maintainer for that library must assert that the library is still being maintained). If you depend on something that's more than 6 months old, your build fails.

Everyone was forced to do their migrations (exemptions would generally be approved but only once). But they had fair warning and could schedule it to fit in with the rest of their work.

tehjoker · on Oct 13, 2021

I don't think they're wrong about the inefficiency of it all being psychically damaging though. I wonder how many dependencies the average project has and how many projects each engineer has to get a rate of breakage that high.

anothernewdude · on Oct 13, 2021

Or the migration doesn't get done and Google kills yet another product.

humanrebar · on Oct 13, 2021

It's worth noting that the one version policy is a technical requirement for linking compiled languages together. Yes, there are organizational reasons to require exactly one version of a project. But compiled languages, especially C++, will do Very Bad Things if you have versioning inconsistencies. It's possible to be intentionally ABI compatible in certain ways for certain lengths of time, but scaled over time, number of concurrent versions of a project, and number of downstream uses, you're just waiting to be burnt hard.

That being said, it's possible to do version pinning carefully at the executable level, just so long as each executable has exactly one version of each dependency in its dependency graph. But then you have a combinatorial explosion of dependency sets to evaluate and support. Theoretically possible, but not very practical. ABI stable languages and languages that ship as source code have a much smaller analysis space since compatibility can be assumed in many more situations.

bombcar · on Oct 13, 2021

You also have to worry about programmers being api compatible- if someone moves from project A using lib v1.3 they may not realize project N is using lib 1.4 with some differences.

anothernewdude · on Oct 13, 2021

You can pin multiple versions of the same dependency, just not with the shitty ldd you're thinking of.

Or avoid dynamic linking altogether.

Rust lets you have multiple versions of the same crate living in the same executable.

humanrebar · on Oct 13, 2021

Statically linking doesn't necessarily fix this. Nor does ensuring only one version is actually linked.

The problem starts when different dependencies have different opinions about ABI-important details. Like layout of data structures, layout of v-tables, use of concurrency primitives, or even stack utilization.

I don't have as much depth of knowledge in cargo (rustc probably can't avoid these problems). From what I can tell, it prefers resolving dependency sets per executable, which means it would be opting for the combinatorial explosion of compatibility sets problem. That's a valid tradeoff to select, though I am extremely skeptical that an organization with 30k Rust projects would find that dependency juggling is a non-problem.

eterevsky · on Oct 13, 2021

The upside of having a monorepo with just one version of everything is that all the code is compatible with each other and easy to understand across projects.

Among other things it prevents diamond-dependency problem and other dependency issues. Google binaries are already regularly hitting the upper size limits for linker. If you needed to link 5 versions of each library it would greatly exacerbate the problem.

Having a monorepo also greatly benefits maintaining security. You only need to fix a security issue in the trunk, and the fix will automatically be applied in every binary that uses your library upon its next rollout.

> all to optimize consumer behavior to boost ad revenue.

I've been working for Google for more than 12 years and never ever have I thought about optimizing ad revenue. Not just that, but I also never heard about this consideration in any meeting that I attended over the years.

> Did I mention the report was commissioned, in part, by Google? (End of Aside.)

I don't see how this in any way invalidates the conclusion. It's not like this report was commissioned for marketing purposes.

> note git is not allowed at Google

This is very misleading. All Google open source projects, including Chromium, AOSP and all of github.com/google use git. What is not supported is a git frontend to the internal VCS. This is mostly due to the fact that support of any additional tool required staffing, so only a limited number of configurations/tools can be supported.

disgruntledphd2 · on Oct 13, 2021

> What is not supported is a git frontend to the internal VCS. This is mostly due to the fact that support of any additional tool required staffing, so only a limited number of configurations/tools can be supported.

It's super weird to me that Google wouldn't staff this. Given how much of the external world uses git, I would have thought that keeping new hires productive would have made this staffing requirement a no-brainer.

But what do I know, I don't work for Google. I'd be super interested if anyone knows why this is the case, though.

gipp · on Oct 13, 2021

There was at one point a git interface. It was extremely cumbersome to use compared to the perforce/mercurial ones and was eventually deprecated. I can't speak to how much of that is staffing limitations versus underlying technical reasons for poor compatibility.

At any rate, it's extremely simple to move to Mercurial from the git world, particularly with trunk-based development.

eterevsky · on Oct 13, 2021

I'm also slightly disappointed by the decision not to support git, but honestly it doesn't really affect productivity.

You have two interfaces to the version control system: one native, that evolved from Perforce, and the second one based on Mercurial, which is pretty similar to what can be done with git.

At some point there was a decision to be made whether to support Mercurial or git, because they have very similar feature sets. I don't quite know why they made the choice they made.

To clarify, all of this only affects the client side of the VCS. It is not really possible to use either git or Mercurial for repository storage without some major trickery.

summerlight · on Oct 13, 2021

> I don't quite know why they made the choice they made.

It was mostly technical one. Piper itself is a distributed file system which has significant incompatibility with git's internal model and git didn't provide a good extension model to hack with. So to provide deep integration, forking was the only viable option at the moment.

Perhaps the equation might be different now since MS did some contribution for their own use and git maintainers have become less reluctant to accept features for monorepo use cases, IIRC. Still, it's harder to retrofit the existing distributed file system into the core git object model than to develop a new one designed with that in mind.

disgruntledphd2 · on Oct 13, 2021

OK, that makes a lot more sense. So the One True Repo is Perforce (it's hilarious to me that Google use Perforce[0]), and there are shims to use it as if it was Mercurial or Git, but the Git one got killed and hasn't been replaced.

Thanks so much for the extra information!

[0] I have no real opinion on Perforce, but I always thought that the reason game studios used it was to deal with binary assets, and I'm surprised that Google would have lots of those.

eterevsky · on Oct 13, 2021

It's not Perforce. It was Perforce 20 years ago. Since then it was completely rewritten, while preserving some backwards compatibility.

Here's an article that I've found: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

lamontcg · on Oct 12, 2021

This is really overreaching philosophically when it suggests that external software is "freedom" while monorepos aren't.

What the author seems to be objecting to is that teams that manage upstream deps make breaking changes and those changes become interrupt-driven work.

From the perspective of the upstream devs that makes their lives tractable, because people get off of old versions and they don't have to support it, and security teams do not have to audit old code, and that prioritizes burning down tech debt and not putting it off endlessly.

The work that the author is complaining about is work that probably needs to happen one way or another. The monorepo as a forcing function to burn down tech debt seems to actually be working there. It would probably be a 10x bigger nightmare of years of tech debt if they didn't do that.

There might be tweaks that could be made to the process so that there were windows every year where breaking changes could be made so as to batch up the breaking changes coming down the pipe -- and require a variance if someone needed to push an emergency breaking change due to an external requirement.

bradleyjg · on Oct 13, 2021

Maintaining backwards compatible isn’t impossible or intractable. It requires thought upfront and then imposes constraints going forward. Two things many programmers don’t like, but are nonetheless generally good things.

A company shouldn’t make it impossible to make breaking changes but it shouldn’t make it easy either.

Semver is a good tool to build norms around. If your two year old project is either 0.x or 11.x you should probably be a bit embarrassed.

lamontcg · on Oct 13, 2021

Sure but that has nothing to do with monorepo-vs-manyrepo. Those are good things to do in any case.

And I suspect you're going to find that inside of Google these are 10+ year long projects which were not designed well enough for that (because honestly nobody does that). And the breaking changes are most likely largely all well intentioned. Without at time machine its just work that needs to get done in one way or another.

SemVer also has its problems, and is not a panacea, for example:

https://hynek.me/articles/semver-will-not-save-you/

And I'm positive Google is aware of SemVer.

zzo38computer · on Oct 13, 2021

Yes, semantic version numbers won't solve everything. But, another problem is too many dependencies, which is independent of the version numbers being used. Sometimes dependencies can conflict, although sometimes there is the way to do it without conflicting. I try to avoid too many dependencies in my programs.

I still think that semantic version numbers are still helpful (for released versions; unreleased versions can just use the hash), even though it doesn't always help.

If you do depend on undocumented behaviours (or behaviours that are documented to be changed in future), you can specify that you require only one specific version, or a range of versions that they have been tested with.

They say "version numbers are unique, orderable identifiers of software releases", and I agree, but you can still do that and still use semantic version numbers too.

lamontcg · on Oct 13, 2021

You gotta realize we're discussing Google and you're talking about things which should hopefully be fairly obvious.

And while reducing deps makes sense to any developer experienced enough to have been bitten hard by it, Google faces the problem that due to its size there will be lots of highly specialized code dealing with issues that would too trivial for nearly any other businesses to bother extracting out into dependencies. You should always make things as simply as they can be and no simpler. At Google's scale simple things will be necessarily highly complicated and specialized.

You're also still talking about issues of concern to you which don't have any bearing at all on monorepo-vs-manyrepo. Fewer deps are clearly better in either model.

erik_seaberg · on Oct 13, 2021

You don't have to maintain the old code forever if you implement the old API using the new API. If you find you can't do that, your dependents can't use it either.

cphoover · on Oct 13, 2021

I think if you are going to critique the explicit strategy of "One version" and forced upgrades, an acknowledgement of the many issues with lazy or incremental upgrade strategy implicit to semver is also warranted. One of my biggest gripes about semver, for instance, is how often supposed non-breaking changes in semver do in fact break client/consumer functionality. Semver as far as I can tell is generally managed by people and is a potential point of human induced error. Perhaps this is mitigated through tooling and integration testing in sophisticated organizations, but in my experience it is broken in much of the OSS projects and less savvy organizations I have worked with.

Still, I find this writeup to be a good glimpse into culture at Google that one engineer found to be, at least, more complicated than the rose-colored glasses through which we usually read about Google's engineering practices. This criticism also mirrors what people have said about Google's tendency to kill off their products that people depend on, which causes reticence on the part of users to adopt new products. (related: https://killedbygoogle.com/)

skissane · on Oct 13, 2021

> One of my biggest gripes about semver, for instance, is how often supposed non-breaking changes in semver do in fact break client/consumer functionality.

Something I've seen happen – management issues a directive, "Every team must use SemVer it is our standard". Now every team has version numbers 1.x.0. Almost nobody ever bumps the major version above 1 – people just make breaking changes on the minor version. There is no way to enforce the requirement to bump the major version, so no way to tell if people are doing it when they are supposed to (I know there is some tooling to detect backward incompatible API changes, but a lot of places don't use any such tooling, and even if they did, it will always be incomplete, there will always be backward incompatibilities it won't be able to detect.) So long as every team's version numbers look like SemVer, management can be told "Directive implemented, everyone is using SemVer now"

cphoover · on Oct 13, 2021

Yea or projects that are forever on 0.x.x pre release so any change can be breaking according the spec (as it is considered experimental) even after having many adopters. I can think of at least a few OSS projects like this.

ntlang · on Oct 13, 2021

https://0ver.org

cphoover · on Oct 13, 2021

This made me chuckle, thanks for sharing.

skissane · on Oct 13, 2021

OSS projects can get away with it, but in some other environments – someone will review the release bill-of-materials, see the 0.x.x version, and complain "why hasn't team X adopted SemVer properly?"–or even worse, "Oh no, we are shipping an experimental pre-release version in the next release to customers!". Change the 0 to 1, complaint resolved – even if it is never changed again. No one can tell whether it should be 1 or 2 just by looking at a list of version numbers, they'd need to understand what the actual changes are in each component version, and they probably aren't going to do that.

I wonder why OSS projects bother with 0.x.x. If you are never going to change the initial 0 to something else, why not just drop it? So you can do whatever you want while paying lip service to SemVer? Why not just do whatever you want, and ignore SemVer entirely?

zzo38computer · on Oct 13, 2021

> I wonder why OSS projects bother with 0.x.x. If you are never going to change the initial 0 to something else, why not just drop it?

Possibly due to being unsure, thinking too many changes will be made and they don't want to drop it.

> Why not just do whatever you want, and ignore SemVer entirely?

If your version number is a single number which always changes, then it will still be compatible with SemVer (since it will be the first number, and the other two numbers are then zero), although not very well in case sometimes it still is compatible you will not be able to upgrade without checking this.

skissane · on Oct 13, 2021

Always incrementing the major version, whether there is a backward incompatible change or not, complies with the letter of the SemVer 2.0.0 spec (but obviously not the spirit) – clause 8 says "Major version X (X.y.z | X > 0) MUST be incremented if any backwards incompatible changes are introduced to the public API", but nowhere does the spec actually say "Major version X MUST NOT be incremented if no backwards incompatible changes are introduced to the public API".

However, if you do that, some people will start saying "this component is really unstable, they keep on making backward incompatible changes to it". Some people just look at the version number.

I don't like SemVer because I think it encourages focusing on the version number instead of the actual version contents, and because it puts all the focus on the backward compatibility of the public API, when that isn't always the thing consumers of a component should be focusing the most on. If I remove some obscure long-deprecated public API which nobody was using anyway, then following SemVer strictly, I must increment the major version–even if that is literally the only change in that release. Meanwhile, I can radically rewrite the implementation, even create a brand-new backward-incompatible next-generation API – but if I also include a compatibility shim which maps the old public API on to the new one, and if (to the best of my knowledge) that shim is complete (even if it actually contains subtle regressions my testing has failed to uncover), then following SemVer strictly that would be a new minor version only. Consumers should be much more concerned about the second release, but SemVer will mislead them into paying more attention to the first instead.

zzo38computer · on Oct 13, 2021

You do not have to use SemVer if you do not like it, and your considerations are valid and are good points. However, there are other considerations.

Allow marking old version numbers as "regressive"; any regressive version number is never considered compatible with any other version, whether regressive or not.

Allow a version to have multiple version numbers that alias each other; that might be appropriate in the case of radically rewriting the implementation in a way that is supposed to remain compatible.

However, that won't solve everything (nor does it even come close). It is necessary for package maintainers to notice if a version has security problems that older and/or newer versions don't, and to do whatever is appropriate in the given situation. Sometimes a program is compatible with multiple versions of a package even if they are not otherwise compatible (it is related to what you describe). Sometimes a program depends on internal details (or bugs) which are subjected to being changed. Sometimes there are other reasons why a user might not want to upgrade (including hardware compatibility). Sometimes conflicts are possible; depending on the situation there might or might not be a way to resolve this.

So, like some other messages mentions, using semantic version numbers won't solve everything, but they won't break everything either. Use or don't use semantic version numbers according to your choice, but whichever way you choose should be documented, and anyone using them should be aware of these considerations.

FOSS is helpful that you can fork software and modify it for your use if needed, in case you want some but not all of the changes that have been made, or if you can make your own improvements.

Jensson · on Oct 13, 2021

Likely this person hasn't worked in a large old codebase outside of Google. Migrating dependencies is a large part of the maintenance work in every large old codebase with lots of dependencies.

> The last two words are the crux of this issue: monorepos deliberately centralize power.

No it doesn't, it democratizes decisions. In a monorepo when someone breaks an API you have the power to roll that breakage back. In a typical environment you will have to deal with that breakage sooner or later anyway without any say on the matter. So a monorepo makes it much harder and costly for library maintainers to break downstream dependencies, while in a typical environment they can break their API every new commit and create a ton of work for everyone else without a second thought.

"But I don't have to migrate without a monorepo!"

Yes you do, but without a monorepo the requirement to migrate wont come from other engineers, but will come from some guy high up who demand that now everyone must switch to at least version X to reduce security issues and the huge maintenance burden of maintaining Y separate versions at once. So if you think that engineers in non monorepo environments doesn't spend a ton of time migrating dependencies then you are wrong. They migrate just as much, but they happen in bigger chunks and gets way more complicated.

kitd · on Oct 13, 2021

So a monorepo makes it much harder and costly for library maintainers to break downstream dependencies, while in a typical environment they can break their API every new commit and create a ton of work for everyone else without a second thought.

I worked on a microservices project where downstream projects put "contract tests" into upstream dependencies. Ie, you can change your dependency if you want, but it has to match the contract/API we agreed with you or your change doesn't go in.

joshuamorton · on Oct 12, 2021

This sort of mixes up a number of issues:

The whole git/mercurial thing is, as I understand it, related to the extensibility of the systems. Mercurial is hackable and extensible in ways that matter, git isn't (or at least wasn't)

The one version policy only really matters for third party dependencies. If you're making a change to a library that is entirely within the monorepo, you can do a three phase migration (add the new functionality, migrate users, deprecate the old functionality). Google is exceedingly efficient at this, and the vast majority of the time these migrations can be done without any effort by local owners.

For third party deps, you can't usually do that, but there's a workaround[2].

I also think that this post and the one here[1] are amusingly both on the front page at the same time. This one decrying the exact sort of mandate based approaches that "make SRE scale".

Also worth mentioning that

> The person making the new API is expected to change client code to match, but is not responsible for ensuring the change does not break the client.

Is just explicitly untrue. There are policies that are clear about this (this generally comes down to "we cannot break your tests, but if it isn't tested, we can't know we broke it")

[1]: https://news.ycombinator.com/item?id=28825352

[2]: https://opensource.google/docs/thirdparty/oneversion/#tempor...

dekhn · on Oct 12, 2021

google employed the folks who wrote the git clients and it was always technically possible to have implemented the monorepo with a git client, but one of the developers (the author of jgit) was opposed to this and prevented it from happening (mainly because his requests for headcount were always denied).

As the ex-maintainer of numpy and scipy third_party at google, I really appreciated the one-version-policy. It was the only way to sanely handle upgrades of thousands of different codes (and google has thousands of different codes that depend on numpy). It certainly helped us managed the complexity associated with mixing various versions of numerical libraries.

joshuamorton · on Oct 12, 2021

> google employed the folks who wrote the git clients and it was always technically possible to have implemented the monorepo with a git client

I think we're talking past each other a bit here. Piper's implementation could have been git-compatible originally yes (I think this is what you're saying?). The thing this article complains about is the turndown of the unofficial, unstaffed (20% at best) git-wrapper client around piper. That was a hacky not-fully-git-like but still well loved (I used it for a while!) tool. Doing it right would have been a huge undertaking and, as I understand, would have required building a totally new thing, while mercurial was hackable and extensible and could be used as a base.

dekhn · on Oct 12, 2021

I'm not talking about the git wrapper. I'm talking about native "git talks to piper" support, with sparse (read-through) for parts of the tree that aren't checked out. It wouldn't have been a huge undertaking, that was never the case. It's what's documented, but it wasn't actually truth. I worked with the individuals involved at the time the decisions were made and the rationale was much weaker than you think.

dan-robertson · on Oct 12, 2021

I think it’s possible for two things to be true:

- monorepos are a good way to organise software, especially if there are many internal libraries or dependencies

- It is bad if your dependencies are always churning and breaking and disappearing underneath you

It sounds like the actual problem might be that there are too many dependencies or that they change more frequently. I think the OP was suggesting that the latter of these is the case.

joshuamorton · on Oct 13, 2021

I would agree with this for the most part, but I'd also note that I think the case the author describes is uniquely bad and that probably necessitates some investigation. IDK, maybe their org doesn't follow the rules, or maybe the product they're responsible for doesn't have enough unit tests, so breakages can slip through, or maybe their (or my) bullshit radar is tuned badly and one of us is out of touch with how most people experience the churn.

(I should also mention that there are recognized issues with the cost of churn at Google, but this isn't really one of them)

jalk · on Oct 12, 2021

It sure is nice that google has tooling for updating code in their monorepo. I just wish that they would remember that those tools do not extend outside of google - Protobuf team I'm looking at you. I have spent countless hours in protobuf JDK dependecy-hell. So much for being dilligent updating schemas in a backwards compatible way, when the library itself will just kick you in the balls if there is an update.

dehrmann · on Oct 13, 2021

And you'd think they'd separate major versions by package so they can coexist (a la commons lang), and you can just pin each major to the latest. Nope. Breaking changes across minor versions.

jrockway · on Oct 13, 2021

I think this is an article about writing tests. Some code the author depended on changed the behavior of their app, while they were focused on releasing a critical fix. They deployed the critical fix and noticed the change way too late in the process, thus wasting an entire workday getting it fixed. Well, that's what tests are for -- so that when some person you don't know edits a dependency, they get a message "you can't check this in until you debug this code you are affecting". Maybe you don't need that for your one-person weekend project, but at a company with tens of thousands of developers working in the same repository, it can sure prevent a lot of problems.

anothernewdude · on Oct 13, 2021

Some guy who updates the dependency isn't fixing your tests. They're breaking whatever and not caring.

Jensson · on Oct 13, 2021

If they break your tests at Google then you just roll back their code. After you've done that a few times they will fix your code or learn not to break your tests. This isn't hostile, this is what the coding guidelines tells you to do, if someone broke your tests you roll it back.

jrockway · on Oct 13, 2021

Yup, exactly. I used to do third-party reviews at Google and people pretty much always ran the presubmit tests and the upgrade CLs had the dependent changes and approvals from those owners. I never encountered any breakage dramas (only people mad that they couldn't add a new programming language to google3); people were generally excellent citizens even though there was no "be an excellent citizen to get promoted" carrot around. (Hell, I did the third-party reviews out of the goodness of my heart, and I doubt any promo committee ever said "hey, we hate jrockway's project work and peer feedback, but he did rubber stamp that upgrade to foobarbaz so let's promote him".)

Things have probably changed since I left, but I bet the author's experience is pretty rare these days too. Google's system works quite well for their scale.

skissane · on Oct 13, 2021

> Rather than be able to pin something to a version that works, the culture is that first someone breaks you (or says, “this might break you, please figure out if it will and act accordingly”), then you have to dig around for the change that caused it, find the right person to approve a rollback, and then figure out how to prevent them from breaking you again

Shouldn't the person/team who makes a change take ownership of fixing any breakage it causes, rather than passing the buck to their customers?

It sounds like the real issue isn't anything as technical as monorepos-vs-polyrepos or software versioning policies, but rather a management and culture problem?

cgrealy · on Oct 13, 2021

> Shouldn't the person/team who makes a change take ownership of fixing any breakage it causes, rather than passing the buck to their customers?

That doesn’t feel scalable to me. I’d imagine some dependencies at Google run to hundreds of clients.

To take an OSS view, some packages run to hundreds of thousands. Imagine if the maintainers of react had to fix all their clients.

ratorx · on Oct 13, 2021

With React, it’s not possible because there’s no systematic way to change client code. With a monorepo, it’s more tractable.

It’s still not scalable in the sense that the downstream can make any change they want. They are restricted to changes they can either manually or automatically apply to client code in a reasonable amount of time. But that’s the trade off for being a heavily depended on library (which a lot of the time is an explicit decision).

cgrealy · on Oct 14, 2021

> With a monorepo, it’s more tractable.

Is it really though? Sure, it's technically possible, but it feels like a massive burden to place one on dependency.

Not only do you have to maintain your own lib, but you have to understand how every other project uses it.

account42 · on Oct 13, 2021

Sounds like a very good motivation to keep breaking changes to an absolute minimum.

underwater · on Oct 13, 2021

Google's architecture isn't designed for individual developer productivity. It's a global optimisation. Google appears to value continual evolution of products, infrastructure and code, and the monorepo successfully achieves that.

In my experience the alternative tends to lead to divergence of codebases and products. We've all seen companies that have some great products counterbalanced with a long tail of products that behave completely differently, or are orphaned, or take decades to move to new infrastructure or integrate with other products.

markbnj · on Oct 13, 2021

>> I can’t help but imagine whether Google Cloud’s low market share is related to this. If the internal culture is that clients must act multiple times a month to deal with impending breaking migrations, at least some of that attitude likely falls on external customers.

We've been GCP customers for five years and I can't recall offhand an instance of a forced migration due to a breaking change. They give a lot of lead time and plenty of notices, and then continue to support the old thing for a good period of time. It makes sense to me that they would treat outward facing APIs differently given that they span many organizations by definition.

altmind · on Oct 13, 2021

https://news.ycombinator.com/item?id=24165445

caller9 · on Oct 13, 2021

https://cloud.google.com/blog/topics/inside-google-cloud/new...

xeorfuzzion · on Oct 12, 2021

I thought google hasn't use perforce for years but something called piper[1]. Aren't migrations good on some aspects such as force engineers to stop using insecure libraries? I would say you could do internal migrations without having to change external apis, maybe google should care about the customer more? or is there some other reason for these external forced migrations?

[1]https://research.google/pubs/pub45424/

dekhn · on Oct 12, 2021

that's correct, the monorepo stopped being served by perforce many years ago (the efforts to keep it scaling were heroic). Instead, google made something which appears to be API-compatible with perforce (so the tooling continued to work) but is backed by Google technologies with the source hosted in the datacenters (which is kind of a circular situation).

orf · on Oct 13, 2021

Isn’t that circular dependency kind of risky? What if a bad deploy knocks offline something required to complete the rollback process for that bad deploy?

I’m sure there are multiple layers of redundancy, but it seems like you could get into a Facebook style situation where you need to go open a (virtual) cage somewhere to flip a switch.

dekhn · on Oct 13, 2021

Yes, it's risky but given all the alternatives, it makes the least nonsense. It works because there is massive redundancy and datacenters aren't all updated at the same time.

When I worked at Google I worked in hardware platforms and I spent a lot of time working with hwops people, who have to go physically out to some server and press buttons (whilst chatting the results to me). If corp gmail, chat, and everything else was down, I might have some trouble getting in touch with those folks and verifying the right fixes (which is why the FB situation is so crazy).

ramraj07 · on Oct 13, 2021

Until something goes bad in a silly place like it did in Facebook and everything disappears everywhere for a good while. Because google is everything.

dietr1ch · on Oct 12, 2021

> I thought google hasn't use perforce for years but something called piper[1].

Yes, but for I guess the author wanted to keep things simple for outsiders.

Piper is supposedly not much different than Perforce. Forgetting implementation details is just a centralized version control system that holds a monorepo and everyone* just test and build from trunk/master/head.

Normal_gaussian · on Oct 12, 2021

> Aren't migrations good on some aspects such as force engineers to stop using insecure libraries?

Sure, but how many times have you deployed an update that you were sure was security? Most libs I work with are 90+% features or ease-of-use releases, the security is hand wavy to do with dependencies, or a very rare "oops".

mattnewton · on Oct 12, 2021

This is mitigated somewhat by having integration tests (which shine in migration-like situations like this despite their many other problems). You can write tests that defend you from some of the breakages at Google and guarantee they will be run.

pkhuong · on Oct 12, 2021

And the ability to run automatically run tests on any part of the code base because everything uses Blaze. You can schedule your wide-reaching changes on the TAP train and see what breaks across the whole monorepo.

In most cases, the library authors are responsible for driving the migration. This is different from third-party libraries adding causing pain to everyone else because they decided to change an interface without sending PRs to migrate all their users.

ec109685 · on Oct 12, 2021

It's fun to see an insider's view. This presentation from their SRE book paints a very different picture: https://www.youtube.com/watch?v=SqU8TZDnFFA

Especially around who the burden falls on when a backwards compatible change needs to be made.

Agree with the point in the post that their mentality leaks in the way they support products.

ec109685 · on Oct 16, 2021

I meant backwards incompatible change.

ramraj07 · on Oct 13, 2021

It looks like OP is mad that they have to spend 2 DAYS a month running down permissions or overrides due to broken dependencies. I mean they are supposed to do what else exactly? Just heads down keep writing code and nothing else?

The question is if they are being overworked for this or if theyre mad that their cushy well paid perk filled 9-5 job is inconveniencing them twice a month to get out of their seat. Like what?

The only times I’ve seen that work is when absolutely no one actually runs the code. If you want to write code at places where it touches billions of people (for better or worse), then this seems like a better way to do things than what other companies seem to do.

brigade · on Oct 13, 2021

I would be annoyed if 10% of my development time was consistently wasted because no one cares about building a foundation that doesn’t shift like quicksand.

Jensson · on Oct 13, 2021

Alternative is spending 50% of your time in meetings where you have to motivate what the code you write should look like in order to avoid people spending 10% of their time writing migration code.

And it isn't like those with dependencies has no recourse, if they think one of the dependencies breaks too often they just remove the dependency and live without it. The library maintainers of course wants to avoid that so they have strong incentives to not break API's. And they can't break API's without first notifying or fixing their users either since then their code will just get rolled back, at Google it is typically much more work to break an API than for the downstream dependencies to fix those breakages.

cphoover · on Oct 13, 2021

This doesn't really address the problems OP brought up... like centralized authority, rebuilding for rebuilding's sake, and priorities interrupted because of sporadically imposed deadlines from other teams... OP mentions a timeline for fixing another critical issue was delayed because of an immediate requirement for upgrading a dependency. I can see how that could be frustrating, especially if not communicated with due-notice ahead of sprint/agile planning.

Just because OP is payed well doesn't mean they can't point out frustrations borne out of what they view as inefficient process.

ramraj07 · on Oct 13, 2021

I’m not convinced that it’s inefficient though. Google is quite likely the company that’s running the most amount of code in the most amount of computers compared to ANY company in the planet, and their system works. I’ll be curious how Microsoft works especially with internal APIs because they’re kinda the one competitor in terms of mammoth code bases and are famous for keeping backwards compatibility for external interfaces at the least.

Minimally they also use monorepos, so there’s that.

dehrmann · on Oct 13, 2021

> I can’t help but imagine whether Google Cloud’s low market share is related to this. If the internal culture is that clients must act multiple times a month to deal with impending breaking migrations, at least some of that attitude likely falls on external customers.

Not GCP, but Google runs some of its open source projects like this. I've been burnt by breaking Guava and Java Protobuf changes multiple times. Especially with Guava, I strongly recommend that library developers avoid it.

flerchin · on Oct 13, 2021

I've been pushing our org towards this for my entire career, and will probably continue to do so. These are problems I aspire to have.

marginalia_nu · on Oct 13, 2021

The wisdom of monorepos is highly contingent on what sort of operation you're running. I've worked in organizations where some higher up read some tech blog that praised them as the salvation of programmer kind and pushed them through, but it turned out to be a really bad fit for the stuff we were doing.

If you have something you might characterize as one large suite of applications, they do bring benefits.

If you have many independent products, as is common in the B2B-space, especially with several versions in production at different customers that have different customization demands, it can be an absolute nightmare. Team A and D urgently required feature X as some regulation went through in their market and their customers had gambled it wouldn't, now Team B, E and G can't produce anything meaningful for six weeks because they need to build a downstream workarounds to circumvent feature X because their customers aren't allowed to have it because it violates GDPR.

videlov · on Oct 13, 2021

At my previous job I was part of the team managing a decently sized monorepo. In my observation the discussions of monorepo vs. multiple repos are often emotionally charged. I explain this to myself with the argument of independence and isolation that is often presented, advocating for multiple repositories.

In those discussions I often bring up that it is important to consider the engineering organization as a whole and think of it as a system in it's own right.

Others have already highlighted the monorepo benefits associated with managing dependencies. When it comes to (forced) migrations - we are all familiar with the accumulation of technical debt at organizations over time. I hypothesise that absence of (forced) migrations plays a role in causing it.

In my perspective the majority of monorepo shortcomings are related to tooling. On the build side I have found Bazel to be a fantastic piece of software. The source control question is another story. This was actually one of the original reasons for me to co-found a devtools company - Sturdy (YC W21)[0] - taking a step back and thinking over the developer experience.

[0] https://getsturdy.com/

jarofgreen · on Oct 13, 2021

So if I worked at Google and in my library/service provided a versioned API what would happen? Would I be told off? Do people do this?

Eg, /v1/dosomething, I need to make a breaking change but I can still provide the old service too so I leave v1 alone and instead add /v2/dosomething

sagarm · on Oct 13, 2021

The article is talking about direct source dependencies, not service dependencies. Services have to be backwards compatible or provide a migration period (of course, there's no real alternative). Sort of similar to your /v2/. If you as a service provider really want to support N versions of your service, you could in theory have multiple instances of your service on different versions. Sounds like an ops and usability nightmare to me though.

Clients must be rebuilt every few months so services do not have to support old clients indefinitely, so as a service provider you can rely on a reasonably well constrained migration period.

kohlerm · on Oct 13, 2021

In big companies you will need some global coordination unless you have lots of disconnected products. Trying to build something integrated and ignoring thar global coordination is needed just does not work

whoisthemachine · on Oct 13, 2021

This sounds similar to an experience I had at a much smaller, less mature company, but without a monorepo. We did, however, have a "platform" team which created many packages which all of the other teams depended on which were often forced on the teams. Unfortunately, the reliability of their packages varied greatly, so unless your applications had great integration tests (our team's did), you would take the update and find out in production it caused you issues.

I think another poster [0] hit the nail on the head when they identified the problem as likely poor dependency behavior.

[0] https://news.ycombinator.com/item?id=28846514

anothernewdude · on Oct 13, 2021

A thought comes to mind. How many of Google's dead and discontinued products are down to the increased burden of maintenance Google push on them.

Jensson · on Oct 13, 2021

Every service is a huge maintenance burden if you take security seriously.