I was given a piece of advice very early on in my career that I've always been grateful for, which is fundamentally the same as this. IF and FOR are both code smells.
One case of this is just simplifying loops with some functional goodness
var listOfGoodFoos = new List<Foo>();
for(var i = 0; i< listOfAllFoos.Count; i++)
{
if(listOfAllFoos[i].IsGood)
listOfGoodFoos.Add(listOfAllFoos[i]);
}
return listOfGoodFoos;
VS
return listOfAllFoos.Where(x => x.IsGood);
But perhaps a more interesting point is it can also be a a sign of DRY gone wrong - two things that aren't actual repeats of each other, but just similar code, are smushed together with lots of IFs hang around to make it actually work.
A connected piece of advice was "it is easier to push things together than pull them apart" so err on the side of assuming two bits of code are not the same, knowing you can refactor them together later (probably only a few hours later) if it turns out they are in fact pretty similar.
Does it come down to having a deeper understanding of what is available? In your example, I guess a new engineer may not know that Where() is available, but a seasoned engineer should know.
I also think that this why the social aspect of code review is so important. It is a knowledge sharing mechanism.
> I guess a new engineer may not know that Where() is avaiable, but a seasoned engineer should know
I'd guess the opposite. Its only in the last few years that many mainstream languages have adopted this sort of functional syntax. The new engineer probably learned it right away from the docs, while the seasoned guy churns out smelly-old for loops without a second thought.
GP makes a great point about different loop logic being 'smushed together', that seems to be the most common kind of ugly code (and I've been guilty of it). Perhaps that is psychological. for/while makes loops appear expensive, while the functional code hides that detail and the programmer doesn't think of it.
Might depend on how trendy the new feature is. Because I still see newish Java programmers who apparently don't know that java.date.util exists, and who also don't know that rolling your own date math is only done by date manipulation library authors and connoisseurs of slow moving train wrecks.
A second, related but harder to pin down, issue exists here. I see relatively inexperienced developers who, for instance, get a little confidence with Javascript chaining and callbacks, and work themselves into corners where a given filter method (like the 'Where()' example) doesn't quite do what they need, and suddenly they have no idea what to do. So they hack in something awful in the middle, forget that it is a weird homegrown something instead of a library function, and see profoundly strange problems later.
Both of those are things programmers grow out of, hopefully. But they seem to be instances of having APIs with large surface areas causing what amounts to research failures.
For loops, even in C# are used in performance sensitive contexts. I believe things have changed in versions of .NET and if you're using things like Mono that also have its own versions.
Foreach generally (or used to) generate more garbage/allocations. Foreach against some types in some versions of C# (ex: structs) produces no allocations. There are also some ugly mutation related things you could do using for instead of foreach, but I'd call that bad code.
The same things are true on many platforms - foreach is generally preferable for non-peformance sensitive code. In some languages and libraries, foreach constructs will do things similar to auto pointers or reference counted pointers, which makes them safe enough, but also slower. The point is that foreach can take up more memory, cause garbage, or cause allocations/free on some platforms and languages which is not always desirable. For is therefore a better choice if you don't want this behavior. Mostly though, using for instead of foreach is micro-optimizing in these cases. Where I'd usually do it is somewhere critical in my code, like something called a lot per frame game loop for example. If I caught someone using a foreach that generated lots of allocations and is called a lot of times in these contexts, I'd probably kill them (and have).
So in short, I guess I'd agree that "many" people would do it, but it's context dependent which include performance goals, target platform, language limitations, and more. Mostly, it's better to just write the code that is safe and works, and go back and fix any bad decisions like this. If it's obvious though, I don't have a problem with the optimization from beginning.
Having made my fair share (if not more) of terrible (and terribly embarrassing) errors in C using the for-loop, I love foreach and its brethren in other programming languages. I only write classical C-style for loops very, very rarely these days.
Absolutely - it's why the advice was so good as it led me to trying to discover alternative patterns that remove the ifs.
On our team I try and code reviews to both be about code quality and learning. Reasonably often we'll go down mini rabbit holes about different ways things could be written and the various merits of them. We also try and have more than two people reviewing to spread the learning even more. That can make it worthwhile rewriting good sections of code to be worse just to discuss why the original way was better.
As a junior I learnt a great deal from reviewing the senior devs' code.
Here's one thing to keep in mind. They both end up doing the same thing, but it's more obvious that the Where implementation is not doing anything subtly different from what you think at first glance.
Without reading the for implementation carefully it's much harder to verify:
* You're looping from start to finish
* Over each element
* You're only adding the items to the new list that match the conditions.
That all said, I think this swap is only a good idea if the thing you're swapping with is commonly used. Which admittedly is a chicken-and-egg problem, but does largely exclude non-library function transformations or one-off transformations.
Depends which expresses the actual intent more clearly. To my mind, that it's the list of foos that are good is what's important, and the loop and conditional are implementation details.
In a way, yes. On the other hand, it boils down the code that some poor soul of a maintenance programmer needs to deal with to a single line which is more readable, too. Win-Win.
I don't disagree. But then again I've had peers review code similar to this and flag an issue saying it's too clever because that poor soul might not know what "where" is doing or that the use of lambdas is "too complex."
That is certainly something I've also come across. This kind of functional style is very readable once you understand it, but it does have a learning curve just like learning what "for(var i = 0; i< length; i++)" means.
I actually did a little presentation to the team about how to refactor a for loop to an self implemented where/filter function to explain the connection. For me, realising that all the where was doing (or at least potentially, there might be some behind the scenes optimisations going on) was passing in the condition for an IF in a FOR loop suddenly clarified everything for me.
In case anyone's interested, this is these are the code steps (javascript)
var people = [{name: 'bob', gender: 'male'}, {name: 'dave', gender: 'male'}, {name: 'sarah', gender: 'female'}];
var getAllMen = function(population) {
var returnList = [];
for(var i = 0; i< population.length; i++){
if(population[i].gender === 'male'){
returnList.push(population[i]);
}
}
return returnList;
}
var males = getAllMen(population);
___
var getAllMen = function(population) {
var returnList = [];
for(var i = 0; i< population.length; i++){
if(isMan(population[i]){
returnList.push(population[i]);
}
}
return returnList;
}
var isMan = function(person){
return person.gender === 'male';
}
var males = getAllMen(population);
___
var filter = function(anyOldList, filterCondition){
var returnList = [];
for(var i = 0; i< anyOldList.length; i++){
if(filterCondition(anyOldList[i]){
returnList.push(anyOldList[i]);
}
}
return returnList;
}
var isMan = function(person){
return person.gender === 'male';
}
var males = filter(population, isMan);
var females = filter(population, function(person){
person.gender === 'female';
});
var females = filter(population, person => person.gender === 'female');
var females = filter(population, x => x.gender === 'female');
I definitely agree with this, but note that the semantics in your code are different from those in Linus's example - your code filters out ALL matches.
This is actually an important point. Quite a few times I've written something that started as a list comprehension, changed it to a lambda later in the sprint and finally to a loop when the requirements became more detailed. - The real world is unfortunately full of business requirements that think that they are special flowers.
Does your coding style not involved lining function arguments (and other things) that are broken into two lines up vertically? Converting between 8 spaces and 4 would screw this up.
My preferred whitespace code-style is to use tabs for indentation and spaces for alignment. Under no circumstances should two consecutive lines be aligning something (with spaces) and have different levels of indentation (with tabs), I can't imagine why you would ever need to do this, so this has never been an issue for me.
My preferred whitespace code-style is to use tabs for indentation
What does "I convert it back to 4 space tabs before I commit anything" mean, then? If you're putting \t tabs in your file, 4 space tabs vs 8 space tabs refers to the way your editor renders \t. What are you changing before committing, your editor settings?
Edit: oops, just realized you're not the person I was responding to. The convert it back before I commit anything comment suggests he/she is using spaces. Or is very confused.
Sorry, I should have mentioned that I was not the GP. Even if you use spaces rather than tabs, the idea is the same. Do you mean something like the below? Note: \t represents tabbing (not necessarily the number of tabs, just spacing using tabs), and _ represents spaces.
int f(int x,
\t\t__int y) // one 4-width tab, pad with spaces
{
\t\t// do stuff;
}
int f(int x,
\t\t\t\tint y) // one 8-width tab, whoops
{
\t\t\t\t// do stuff;
}
In this case, you should not be using tabs, since logically you haven't entered a new block yet and as such should not be increasing the indentation. It should be
Even if you use spaces rather than tabs, the idea is the same
It's not, because there does not exist an editor that is smart enough to recognize that these 4 spaces are indentation tabs while these 4 spaces are spacing spaces. It's not impossible, but show me an editor that does it today (especially with C++) and I'll eat my hat :D
I agree with you that if you're using actual tabs for indentation it works fine.
>It's not, because there does not exist an editor that is smart enough to recognize that these 4 spaces are indentation tabs while these 4 spaces are spacing spaces.
Oh yes I see your point. All the more reason to use tabs for indentation and spaces for alignment. :)
You don't know what that Where() costs you, though. The abstraction in your second snippet does _at least_ as much work as the code in the first snippet _and_ has an unknown level of extra overhead.
Abstractions are not free. Because we can almost always afford the luxury of abstractions, some people forget that there's an engineering tradeoff there. There's always something underneath that nice, pretty-seeming abstraction that has to do the dirty work. Because that abstraction has to be pretty and handle generic situations, it (not always but often) can't be optimized to be more efficient in specific situations.
Zero-cost abstractions are a thing though. Compiler magic can even fuse together multiple calls like this on the same collection into the equivalent regular loop. So it can in fact be faster rather than slower, in the right context.
Like most things, it's all contextual. Were I your manager and we were working on a business app, this might pass code review just fine or I might even suggest what you did here if you had showed me the original.
Another piece of context in a business app might be - "Do we need these values now or later?" That is, should this be lazy, stream results, something like that? I am assuming this is supposed to be C#, but still speaking somewhat generally. You might need to instead yield rather than looping and returning an intermediate list. Better yet, I'd also look for you to return something like a read-only list (ex: IEnuemrable in C#) since this is a query result and that can be done with the first version.
Unfortunately if we had been writing a game though lets say, I would have failed you miserably for the latter version and added 1 mental strike to the "send you packing" count. Why? For loops are most certainly not code smells on most platforms. As a few people said, the mechanics of "Where" in some languages may just be cloaking a for loop. More importantly, for loops tend to be much more performant, can be inlined easier, and generally produce less garbage than lambdas and other similar language constructs. It may different than I last looked, but back when I used to touch C#, for was most certainly preferable for anything performance sensitive provided you didn't need laziness, and if you did, there's still at least "yield" (though at that point, we're approaching what Where can does anyway). I could go on about cache lines, trashing, GC pauses, and so on, but hopefully you get the idea.
Overall, I am afraid I must disagree and point out that your advice you treasure is too generic. FOR and IF are very valuable in certain contexts. I think the real advice someone might have been trying to give you is to aim for more functional constructs, that is functions that are pure/idempotent/referentially transparent, compose, and operate on sets of data where possible rather than individual points of data. Like any advice, this isn't universal and quickly unravels in many situations such as performance-sensitive contexts like game programming, real-time programming, and so on, or may in fact just be the best practice of the target language.
I absolutely agree with you about performance stuff, if you are in a position where milliseconds matter this may well not be the right way of doing things.
Also, to open up another interesting area, Linq operating on IQueryable objects can do some pretty interesting stuff in the background, merging sql queries and the like. I've never looked into if it does anything similar for IEnumerable - although I will should the situation come up.
FOR and IF are indeed valuable, but that doesn't make them not smells. For me a code smell is not an anti-pattern - it is just an indication that something has the potential to be wrong. If you are writing lots of IFs and FORs in your code that might well be fine, but it has a good chance of causing problems. Smells, for me, just mean "justify this decision" - not "never do this".
Things like FOREACH are good sane defaults, so the justification only happens based on what you are doing like in the performance example.
IQueryable actually is a superset of IEnumerable (implements it - https://msdn.microsoft.com/en-us/library/system.linq.iquerya...). It is generally useful if you want the laziness and ORM/ORM-like features. If you are using LINQ with SQL, then it would be better to return something like that for many use-cases. IEnumerable is good for general use of course and when you want to make the assumption that you "don't need" all that stuff which is pretty good one for more general cases where you just want read-only results or lazy read-only results (via things like yield).
I agree about potential. Nested FOR and IFs though raises that potential hugely. There is rarely a reason to be nesting more than a few levels deep. In many decades of programming, of course I have deeply nested, yet I have refactored it each time. I have managed to avoid doing excessive nesting the entire time as have most of my colleagues. I am sure someone can find a piece of code somewhere written for something that simply cannot be refactored well or justifies itself otherwise, but I'm not afraid to say statistically that is easy to avoid and hard to justify. I think the nesting issue just raises a bunch of flags including: performance, composability, reusability, clarity, maintenance, and cloaking. So in that sense I do feel it is what you describe as an anti-pattern more than a code smell.
That is more imperative programming (your first example) vs. functional programming (your second example). The problem with functional programming is that you need to know how the functions are implemented to be able to estimate performance.
Don't write the block to the if without curly braces on the next line. It's a common source of bugs when the code has to be extended (possibly by someone else), to forget to add the braces.
It's not simplifying it though, it's just hiding the complexity in syntactic sugar. I prefer the first way of doing things vastly over the second. Yeah, it's more code, but it's also more or less "what is actually happening", instead of an euphemism which has to be unpacked.
I disagree, I believe it is properly using abstractions to write simpler, more straight-forward code. I think of syntactic sugar as 1-to-1 replacements. For example, the -> in C and C++ is syntactic sugar for a dereference followed by a field access (ptr->field; (*ptr).field). If it's not a 1-to-1 replacement, then it's more likely to be an actual abstraction.
The first example is code duplication, and almost every line is suspecious.
var listOfGoodFoos = new List<Foo>(); // is listOfGoodFoos a good name?
for(var i = 0; i< listOfAllFoos.Count; i++) // is the loop condition correct?
{
if(listOfAllFoos[i].IsGood) -- the only interesting line is burried in here
listOfGoodFoos.Add(listOfAllFoos[i]); -- is i the right index variable? (vs. j, k, n, especially in nested loops)
}
return listOfGoodFoos;
The second example is obviously correct, and highlights the condition.
A quote from Tony Hoare comes to mind:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.
Actually, it's written exactly as often as the original construct.
The idea that writing more than 5 letters, or a for loop here and there, is this giant cesspool source of errors is totally alien to me. Of all the ways in which my programming sucks, this was never among them. So thanks for you permission, I'll use it wisely :P
Yep. As long as the abstraction is tested and functions as advertised it can be very useful.
I'd much rather see a few lambdas in a chain of function calls than blocks of conditionals and loops. Every keyword or bit of code that has to be typed is a potential source of errors, and abstractions of this sort help minimize that.
All programming languages are "euphemisms" in that sense - even C has a lot of layers between it and how modern hardware behaves. It is simplifying if it expresses the important aspects of the result rather than the implementation details of how the machine calculates it.
Sure, but when programinng, what the machine does and how long it takes, sometimes does matter. I have to think of all those jQuery examples floating about out there that just do $('#foo') in several places -- sure, it looks simpler than putting it once into a variable, but from the viewpoint of what the computer ends up doing it's utterly ass-backwards.
There's no reason a `.where` should be slower than an explicit loop though. Indeed it should offer more opportunity for future performance improvements, because it doesn't constrain the implementation with irrelevant details - the compiler is free to e.g. make the loop run backwards, or split the collection into chunks and parallelize, if it figures that that would be faster.
At some point that breaks down though doesn't it. I mean from one perspective a conditional in high level code doesn't describe "what is actually happening, either." That's especially true if an optimizing compiler or interpreter mangled it up.
From another perspective your argument can be applied to any function call in library code.
In either case I don't think your position is that strong.
> I mean from one perspective a conditional in high level code doesn't describe "what is actually happening, either."
Sure, and from another perspective even the most efficient code does nothing to stop the heat death of the universe and is therefore functionally equivalent to doing anything else or nothing at all. However, that's just splitting hairs. It's not really an argument anyway, but rather a preference.
The question is more whether it's a reasonable preference to think syntax X is helpful but syntax Y is hiding things. Thinking deeper might show that the issue is actually familiarity, and the two should be rated the same in terms of abstraction.
One case of this is just simplifying loops with some functional goodness
VS But perhaps a more interesting point is it can also be a a sign of DRY gone wrong - two things that aren't actual repeats of each other, but just similar code, are smushed together with lots of IFs hang around to make it actually work.A connected piece of advice was "it is easier to push things together than pull them apart" so err on the side of assuming two bits of code are not the same, knowing you can refactor them together later (probably only a few hours later) if it turns out they are in fact pretty similar.