They made a typo. Then they didn't test on their local machine. Then they didn't...

datasink · on March 19, 2011

A production config file is typically an exception to the rule. If you have a development team, you will probably not want each one of them to know what your production passwords are. If you have sysadmins, they will probably want none of the devs to know.

For what may be a thoroughly tested and properly deployed release, what happens when a sysadmin needs to update a password for a database? In many cases, without custom tools or a formal process, they'll just pop the config file open with vi and rsync the change out to their web cluster. I'd bet this is what happened in Tumblr's case. A sysadmin did it. Probably to avoid doing a full production redeployment for a simple config update. They will put resources into formalizing this now that they've been bitten.

jasonlotito · on March 19, 2011

In my opinion, of course, sys admins shouldn't be touching configurations that affect production code. You can have config files kept from the development team, but only allow access to the actual config file to a few select individuals if you need to. Sensitive data can be kept separated.

> For what may be a thoroughly tested and properly deployed release, what happens when a sysadmin needs to update a password for a database?

They coordinate the efforts with someone on the development team to deploy this. A sysadmin touching source code is as bad as a developer making changes to the networking side, especially if neither are talking back and forth.

This is how we work. Any changes made by networking are first vetted on by me, for example, for the systems I'm responsible for. I work with them to ensure that deployment is done at the proper time, and we handle any possible problems on our end. The networking team doesn't touch anything we work on, and vice-versa. Communication becomes key.

datasink · on March 19, 2011

Coordinating between teams for any changes to a production config file is a hard sell, but yes, this is the only really solid way to make certain stupid mishaps don't happen. This is how we did it at my last company as well. In addition to using production branches.

Most companies pin down their processes in response to issues that pop up.

nbpoole · on March 19, 2011

Yes, but of course this is all speculation: we don't know how this change managed to make its way to production. The fact that it happened is unfortunate, but that doesn't mean the entire engineering staff is incompetent.

jasonlotito · on March 19, 2011

Of course. =) But it's fun to speculate while I'm busy doing chores around the house. As for being incompetent, I didn't mean to imply that. Rather, my focus was on the fact that the mistake wasn't just a typo. There were a LOT of mistakes that were made.

rokhayakebe · on March 19, 2011

Is it possible that someone accidentally pushed a file live?

jasonlotito · on March 19, 2011

Of course. =) I don't know what their deployment strategy is like. My point was, they made a series of mistakes, and it wasn't just a typo. Though, if they did push a file live, they would still have skipped numerous steps, like testing.

I haven't read the update, so I really don't know.

I'm not suggesting they are morons. Rather, they made a series of mistakes. Every programmer makes mistakes. The key is to have a strategy to catch mistakes. Something as simple as a staging server where something like this could have been caught, and having a deployment strategy where you cannot get past staging without going through deployment is a good idea.