> Respect code and systems that came before you. There are reasons for every code and every guard that exists in production
Sometimes there's bad code in prod, or code that doesn't need to exist anymore. You should try to understand when something is there for a good reason versus a bad reason. Cut out the bad code, keep the good code.
I've seen a lot of cases where people assume that current code is untouchable or designed very intentionally, so they wrap it and then you end up with a tower of abstractions built on a poor base.
So, I think as a senior engineer you should try to understand the difference between code that should stay and code that should go.
Took over a system in Aug '18 with someone else. Previous 'senior engineer' had left after a year to go be a CTO someplace else. Everything seemed wrong - bad smells everywhere, spidey-sense a-tinging every day.
In October, we found that code put in place had been losing data since April - we had 6 months of pure data loss. Data that people assumed was 'working' because ... they hit a button and looked at the screen, and it was there. HOWEVER... after the midnight job run, the data was gone. No one ever looked the next day. Until October, and then it was way too late.
The whole "respect the existing code" has a HUGE assumption in it - that things are actually WORKING CORRECTLY in the first place. It also often assumes some of the original people who wrote the code you're dealing with are still on the team and and to answer questions.
THAT particular project above - my gut was "rebuild" - and we didn't. 16 months later, we are still finding code that is simply broken - data is lost, things are breaking, etc. But the whole "well, you're just someone coming in new - all devs just want to start from scratch and you shouldn't do that!" reared its head, and that idea was pushed aside.
I'd turn it around. Someone needs to be able to demonstrate - usually via automated tests I'd think - that things are, in fact, working as expected/assumed. If not, study/understand, then feel free to rebuild/throwaway.
I've taken part in another project with hundreds/thousands of tests, and while I hit code that looks/smells weird sometimes, some of the original people are on the team still, answer questions about 'why', and the tests pass. Even though it sometimes looks 'bad', it's functional and we can demonstrate its functional and working.
This is a HUGE difference, and that assumption about "it's working correctly right now" shouldn't be taken for granted.
I think respect does not imply assumption of correctness. It's more like: there's a reason why the code looks like it does. The reason could be time pressure, some crucial information missing at the time when it was being implemented, etc. By all means try to improve it but don't look down on a person who created it. It's easy to judge from perspective of 6 or more months.
"The reason" can also be lack of basic competence in previous workers. That can be extended to the rest of the team or management around the worker(s) for not providing a guidance/help.
"reason" can be "cause/effect" reason, but also "justifiable rationale". The files look like they do because someone typed keys and hit 'save' - cause/effect. But there's often no justification for how some code exists in its state that has to do with issues related to the business/logic itself. External factors - time (as you mentioned) or missing info (as you mentioned) - that missing info should have been documented somewhere.
I put out a lot of bad code in my early days. Seniors who came in afterwards ... yeah, the only or primary reason they could get from my code was "he wasn't very good at this". And they were correct. I got better, but primarily through trial and error (lots of them).
In thinking of some specific projects from 98/99 - I was demonstrably not good. I got stuff done, but it was generally very inefficient. But... it worked (to the level of understanding of everyone on the project). Even then, people weren't coming in after me to make basic stuff work, but they did help make it better.
If you're coming in to a project and there's no unit tests, no sample data, no repeatable build process, no documentation, no testing or verification process, and no previous team members to actually talk to to get questions answered, there's little reason to have any assumption of correctness. There may very well be 'reasons' for the lack of all of those factors above, but nothing makes me assume the code is correct. And... if I'm brought in to a project like that, "respecting" the code or the people before adds no benefit to the project, and may detract from getting stuff done.
> "The reason" can also be lack of basic competence in previous workers.
That type of wording belies a destructive approach to the problem. What I'm told is "lack of competence" is often someone not being trained properly, or leadership that decided to make a deadline no matter what, or a junior dev that got called a senior dev to fill a seat at a cheaper rate, or someone who knew they didn't have the skills but didn't want to admit it and lose their job.
The other thing I often find is that regular complaining about other people's code is usually projection. The less forgiving someone is the more likely that their code and architectural decisions suck even more than the people they're complaining about.
There's a senior dev at my job who openly craps on other dev teams as incompetent - to the point that he literally says, "those guys are fucking stupid and their code sucks". This guy never documents his own work, makes inflexible architectural decisions that don't take anyone else into account, and generally causes hours worth of workaround code for me every month.
Often, but not always. When people generalize, it's an equal error to conjecture that the premise for the generalization is always false in the other direction. Lack of skill/experience is just a nicer way of saying incompetence, and sometimes this is a proximate cause, and sometimes this is actually a root cause, in cases where the person was a genuine bad actor who fooled others into thinking they were qualified for the position.
Now of course there are many times where this is used as a crutch to blame system problems, but I've found a trend towards the idea that it's never due to lack of skill/experience on the part of an individual, and always due to some other root cause. That seems equally incorrect, and good judgement about the cause should be agnostic to how one feels about the idea of saying an individual failed at their jobs (which usually isn't a pleasant feeling for most.)
On the upside, one should actually expect people to fail at their jobs occasionally within certain boundaries, in order for them to grow. So failures to due to 'incompetence' (not negligence) should be often accepted and blameless, given they should be happening occasionally if you are pushing people past their limits. The best way to ensure someone doesn't make large categories of mistakes is to let them make one with a small blast radius, and having them take responsibility for addressing the failure and remediations. (This isn't at odds with blamelessness -- most ethical people will accept responsibility for their failures and correcting them, if they are able to fail in an open, respectful environment)
"often someone not being trained properly, or leadership that decided to make a deadline no matter what"
> That can be extended to the rest of the team or management around the worker(s) for not providing a guidance/help.
I indicated this isn't always/necessarily one person's fault.
re: your senior dev. he might be correct that their code sucks, but he's also being an ass in his own right.
I've told this story a few times to folks where... in 2017, I was contacted by someone who said "the system stopped working". Turns out it was something I'd built in 2002/2003. It's eye-opening and humbling to go back and fix your own mistakes from 15 years earlier.
> The less forgiving someone is the more likely that their code and architectural decisions suck even more than the people they're complaining about.
I don't disagree. It's why I tell people to document as much as they can - document the meetings behind decisions, document the rationale for cutting corners. It will help YOU later on when you have to go back and can't remember why you did XYZ. (See above reference to 2017/2002).
I fully realize my stance comes across as offensive in some situations, and I do not always project this stance. I'm fully aware people are forced in to bad choices some times - I've done it myself (on both sides of the table).
However, I'm also often coming in to situations where I have nothing more than running code and a client wanting stuff done. My scenario above about no tests, no sample data, no testing process, no documentation and no previous dev to consult with... it's probably been more than 50% of the projects I've worked on in the past 20+ years. Clients/employers also read some of the same blog posts, and will sometimes throw out the "you shouldn't just rewrite stuff!" line (often while simultaneously telling me stuff is broken but "used to work 2 years ago"). These are sometimes 'dailywtf'-level projects, and if the primary goal is to make sure data and processing works as people need and expect today, large/wholesale overhauls are often necessary.
I've been on a short term project with a team the last few months and this is 100% different from many previous projects. There's 30+ tech folks, I'm on a team of 8, a mix of jr/sr folks, decent communication, a moderately large (and up to date and growing) test suite, test data, some documentation (could be improved), but generally all good people. I've seen some evidence of the "I'm the awesomest" dev work in the code repos, and those people are no longer around.
I'm not coming in to this project saying "everyone before me sucked ass and I'm a dev god". mostly because it's nowhere near true, but the process of getting stuff done is more team oriented - there's people to help, people to review, a good culture around the project, etc. There are problems but nothing like some of the other scenarios I listed above. So... this is the sort of scenario where it's a appropriate to respect the other folks, previous code, process, etc - it's all there and there's a defined way of working and getting better. Not all projects are like that.
FWIW, to the extent I find myself complaining about "other people's code", it's usually just a prelude to complaining about the processes which led to it. Why is someone being told to do XYZ without any context? Why are they excluded from decision making meetings, but given full responsibility for hitting a deadline they can't possibly meet? These aren't questions for the dev who wrote the code, but for the project management which led to that.
Example: Was brought in on a project with horrible code a few years back. Utter garbage - everything was NIH - they'd built everything from scratch. Digging a bit further, I find out that all dev and server boxes are limited from hitting the internet. No package managers work for anyone, so they build everything from scratch. Exceptions could be made - in writing - and within a few days or week or so, you might have someone pulldown and compile resources from a 'blessed' connection, then give you that code to run yourself. Things like encryption/decryption algorithms were being written by hand because it was the only way to hit deadlines in that working environment. Yes, it's literally insane to try to work that way, but I had a team of people who all did that, and felt trapped. It wasn't their fault entirely (excepting that they should have all just quit at once, or threatened to) but day by day things just got worse. Poor management was the root, not the devs - the bad code was just a symptom.
> document the meetings behind decisions, document the rationale for cutting corners
... document the API, document the project goals, document the company values, document checklists and routines, document processes and personal instructions. DOCUMENT THE GOD DAMN HISTORY OF FAILURES AND INCIDENTS!
And then rely in job on all those documents, not on talks and experience or memory of some key engineers. It hurts when you start, people tend to blame all those unnecessary staff, but it pays off in so high rates later. Often it's possible to say if a company will be successful on long run only by seeing its documentation.
It's not an assumption of correctness. It's an assumption of purpose. Chesterton's fence[1].
Let's say you inherit a large codebase, and it in fact does not have any tests, and is relatively complex and convoluted. Obviously that's not going to be very maintainable without testing, and it may make sense to do a rewrite...
But first, write tests. Test your assumptions. Read the commit history and see how the code base evolved over time. Maybe the reasons it was built the way it was are dumb ("I like naming my variables after my favorite flavors of pie!") or maybe it was done for really good reasons ("The database library breaks with odd numbers of connections, so the pool always has to be an even number."). The issue is that you simply don't know and risk repeating your ancestors mistakes unless you do some investigation.
You're assuming there's always 'commit history' to review :)
I completely agree you need to do investigation, and document what you can about the system. Some of that documentation will take the form of tests. Without a doubt.
The 'assumption' I refer to is either clients or other devs assuming something is correct. I've lost track of the number of times I've heard "it was working fine until 2 days ago", when, in fact, it was never actually working, just not throwing a visible error until 2 days ago.
I will look at the fence - have seen reference to it before.
"until the reasoning behind the existing state of affairs is understood." - you may also need to realize the original rationale may never quite be understood. I've hit this a few times, and we've ended up just scrapping a particular set of functionality because no one could actually tell why it was there any more - everyone involved who may have used it or wanted/needed it is gone, and it's useless (or is now a blocker for other progress).
I think you are making some great points here. I have seen code commits are mostly right click in Eclipse and commit all modified files irrespective of if they are local config changes or preferences.
The senior devs in projects I worked have made 10 level deep 'strategy patterns' for future enhancements which still had hardcoded values underneath and absolutely non-extensible.
So in my books senior developers knew what they were doing comes with huge assumption. I have noticed many sr engineers simply had delusions of grandeur far beyond their programing skills.
> So in my books senior developers knew what they were doing comes with huge assumption.
This is another big assumption, and... especially given that we know in our industry that 'title inflation' is a real thing... "sr" just doesn't seem to mean much. I'm seeing lots of job postings today that call for "sr foo engineer", calling for 3+ years of experience. My definition and expectations of 'sr' are far different.
Interestingly, I've only been in a couple of places that even defined what they actually meant by that title - what was expected was written down. It was still open to interpretation, but a baseline to judge you against, and to give jr folks something to shoot for.
> which still had hardcoded values underneath and absolutely non-extensible.
Don't even get me started on people that just learned the 'final' keyword and abuse the hell out of it. :)
That's why I think code that seems "weird" should always be accompanied by a comment explaining why it is the way it is and why a seemingly more obvious approach wasn't taken.
Better might be a comment linking to a ticket or wiki page or internal README with far more explanation. What's the technical reasoning, and what was the business impetus for requiring this done this way at this time?
Again, I've been on both sides of this fence, but the "never rebuild" mantra has a lot of assumptions built in, chief being that things are actually working correctly. If they are, and there's no tests around it, memorialize the existing state with your own tests at the very start. It's what we did on the August project referenced above, and it took weeks (months, really) to have a modicum of basic tests ensuring some base level of understanding for our team (just 2 of us).
I think you are missing the point. Respecting systems doesn't mean 'not touching' or 'working around' existing systems. It just means 'understand it, before you change/rewrite it'. Otherwise, your rewrite might be even worse.
I have seen people proclaiming a rewrite with a newer technology would be a good idea who didn't have the slightest clue about the system they were trying to replace (except that it was build with some outdated technology) and I am 85% sure, that their rewrite would have been worse. When I confronted them with my claims, they gave up quickly.
As a Senior Engineer, you should know what you are doing. If you see obviously bad code, thinking about a rewrite is no sin. Just try to understand what caused the current system to look like it does.
Fair point: sometimes things are, in fact, bad enough that a rewrite is warranted. IMHO, this is where the "senior engineer" mantle becomes less about engineering and more about interpersonal / organization-navigational skills.
Let's say I'm a manager at said company. An engineer who just joined my project - which our previous senior engineer, who I trust (fairly or not), built - tells me it needs a complete rewrite. My default reaction is likely skepticism: the received wisdom is that rewrites are seldom justified; your assertion contradicts both this wisdom and my previous senior engineer. What do you expect my answer will be?
As the engineer making this recommendation, you'll need to do a bit of showing why. Part of this is technical: write some tests that show the problem, quantify the data loss, etc. Show that you've carefully balanced rewrite vs. other more incremental approaches to fixing the problem. Prototype part of a newer system quickly and show that it can be done.
The harder part to motivate - and often the more important part when it comes to "how do I get people to say yes?" - is non-technical. What's the business impact? What's the user impact? Is this an existential risk to the company if not fixed? How so? How do we assure our clients / users / etc. that time spent here is more important than time spent on features they want? Can you get someone with significant decision-making power to agree? How will you de-risk the rewrite and eventual migration, neither of which are trivial? How can you get just enough buy-in to secure the time needed to answer these questions, show quick progress, and reassure others that this won't be an even bigger catastrophe?
if they're still around, you can ask questions, and get a much better handle on why it's the way it is, and perhaps avoid earlier mistakes, and come up with a better strategy. maybe even just actually document the 'whys' and leave it as is.
It's very situation dependent. I try to advocate for the old code when I can, but I've had similar experiences. The most recent example was a multithreaded app started by a person who understood neither mutexes nor pure functions. Doing a ctrl+f for "sleep" for quickly showed that code was doomed to always be breaking in inexplicable ways. Give code the benefit of the doubt as much as possible, but there is a point where old code deserves more suspicion than respect.
I think the main thrust is respect the code and understand why it was there. Don't start with the possible initial feeling that a particular piece of code or guard is not needed. If you do find that it's not needed, remove it or fix it.
I'll second manojlds's comment. If there is bad code/design decision on production - it not necessary means it was bad all the time. Software grows, requirements changes and it is very likely there is a reason why the code is as it is now. Even if it does not solve the problem efficiently today.
People also tend to forget human and business impact. The code might look different if there is 1 week or 4 weeks given for implementation. Timelines, pressure from management/stakeholders to deliver faster has impact to made solutions and implementations.
That is when respect is needed - having empathy to understand why this part of software looks like it is now.
Yup. That was my inspiration, and sadly, I have been on the side as well where things started from scratch, and ended up blowing up couple of months later.
> you should try to understand the difference between code that should stay and code that should go.
I think that’s exactly what the rule is saying, I think almost everyone would agree. The problem is assuming that code is bad, or even actively looking for “bad”. If it’s in prod, that usually means it worked and was needed. Engineers in my experience do like to assume things are bad and actively look for reasons to re-engineer things without properly accounting for the number of ways things are going right. During my career, I’ve watched two separate large teams at two separate companies decide to rewrite a major codebase because they thought there was too much bad, and they thought there was a tower of abstraction. The result was they both dramatically underestimated the time needed to rewrite, and after having wasted literally tens of millions of dollars, ended up with roughly the same code quality as before and a couple of years of unnecessary down time. It’s more difficult to understand the code that came before and the reasons why it’s there, and it’s easier to judge it bad and decide to rewrite it without completely understanding it, that’s why the default should be to assume the code is there for a good reason, and try harder to understand the reasons.
That rule should, of course, be balanced with this rule, both are true:
> Simplify code, systems, and architectures relentlessly
I personally suspect that writing down and maintaining all the requirements and lifetimes for the codebase separately from the codebase, and having a regression test suite with complete coverage, is the way to safely deprecate features and continue to simply old large complicated codebases. But I’ve never seen that work in practice, yet...
I don't think the OP meant "don't touch old code", but this sentence, to me, implies that "reasons" means good reasons:
> There are reasons for every code and every guard that exists in production.
Sometimes the reasons are not good ones. Maybe that's exactly what the author meant. The rule doesn't read like that to me, though.
And yeah, if you're rewriting something, don't rewrite the whole damn thing! Good code is modular and you should be able to pick it apart and replace components that are causing problems. If it's not causing a problem, don't rewrite it...
However if you're going to be adding features to a section of code (or otherwise working on it a lot), it's a good time to try to understand if you can do a bit of refactoring along the way to clean it up. By default, all code gains code smell over time and also becomes more robust. It's a tight rope to walk.
It is true, sometimes the reasons are not good ones. But, if it is in production, one should assume the reasons are good ones, there's already some evidence. The point is mainly that the burden of proof is on the reader to demonstrate the reasons are not good before modifying it... especially because you're right: code smell is the default product of time. We have to be careful about assuming that code smell indicates something is wrong. My couple of decades of experience is that people assuming reasons aren't good and jumping into modifications is a bigger acute problem than being more conservative than necessary. Accumulating the tower of abstractions is no doubt a problem, but it happens more slowly and is caused by a multitude of bad practices, it rarely breaks production or causes downtime, unlike refactoring too eagerly because of smell or opinion rather than demonstrated necessity. It is indeed a fine line, I completely agree!
Replacing something simply because its not code you authorised or written isn't respectful. Replacing something because its not on a shiny platform, took too long to figure out, or simila, again isn't respectful.
replacing broken code, fixing bugs, changing architecture to adapt to another situation is fine.
> Respect code and systems that came before you. There are reasons for every code and every guard that exists in production
Sometimes there's bad code in prod, or code that doesn't need to exist anymore. You should try to understand when something is there for a good reason versus a bad reason. Cut out the bad code, keep the good code.
I've seen a lot of cases where people assume that current code is untouchable or designed very intentionally, so they wrap it and then you end up with a tower of abstractions built on a poor base.
So, I think as a senior engineer you should try to understand the difference between code that should stay and code that should go.