Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GCC 6 Will Warn You About Misleading Code Indentations (phoronix.com)
162 points by frostmatthew on Jan 10, 2016 | hide | past | favorite | 60 comments


> "Specifically, a warning is issued for if, else, while, and for clauses with a guarded statement that does not use braces, followed by an unguarded statement with the same indentation."

> "The warning is not issued for code involving multiline preprocessor logic ... The warning is not issued after a @code{#line} directive, since this typically indicates autogenerated code"

Seems like a good thing to have, and pretty conservative about not causing false positives. In particular, constructs like double-iteration should be fine:

    for(y=0; y<ymax; y++)
    for(x=0; x<xmax; x++)
    {
        //...
    }
Which I checked because I like this construct and people very rarely think about it.


That pattern still seems like obfuscation to me. Why not indent the inner for loop, to make the control flow more obvious?

For instance, consider how "break" or "continue" inside the braces would work.


The reason for doing it that way, is to emphasize when two loops together are conceptually iterating over a single thing, and have some symmetry, like iterating over the cells in a 2D grid. It makes it possible to line up the elements of the two loops, emphasizing the similar structure.

I agree that it would be very bad form to use "break" inside a block like that. But in the cases where this construct applies, it would still be bad form to break if the outer block had curly braces and indentation, because skipping the rest of a row isn't a natural operation in the sort of way that skipping the rest of a collection is.


Even with that explanation I am not convinced of the soundness of that notation. For example, if you think you have a particular row that is causing you some problem and want to add a print statement for the row number, your code becomes, if you are too quick and forget to add braces:

    for(y=0; y<ymax; y++)
    printf("y=%i\n",y);
    for(x=0; x<xmax; x++)
    {
        //...
    }
And it will not work anymore as intended.

Granted it's a small mistake and it will not take more than 5 minutes to fix, but it still add a small load to your already busy mind.


This doesn't seem like a realistic mistake to me. The printf is just at the wrong indent level; this is the same mistake as:

    for(i=0; i<imax; i++)
    printf("i=%i\n",i)
    {
        //...
    }
Which also compiles and which is wrong in the same way. I don't recall ever having made or having seen this.


Well, not really, because in the example you just gave, you would put the printf right between the braces as it should be. In the case originally discussed, you have no braces so you would have to add them, and that can be forgotten.


Totally agreed, but in this case if the author had declared the variable in the for-loop:

  for(int y=0; y < ymax; y++) 
     ...
and didn't shadow y from an enclosing scope, that would have given the compiler a chance to catch that error.


If the option is between "something we personally like but has edge cases or requires other understanding that could confuse some readers" versus "something everybody will understand," the first case should never happen.

It's almost like everybody forgot about https://matt.sh/howto-c#_formatting already. :(


However one can say they are not quite the same thing, as you mentioned, how "break"s work, so it is obfuscating things a bit. But even performance wise: in order in which you iterate over the arrays can make a significant difference because of cache effects. So it is important to make obvious which one is the nested loop and which one is the outer one.


> The reason for doing it that way, is to emphasize when two loops together are conceptually iterating over a single thing, and have some symmetry, like iterating over the cells in a 2D grid. It makes it possible to line up the elements of the two loops, emphasizing the similar structure.

I'm not convinced. Even iterating over a 2D grid, one loop is iterating over the columns, and the other the rows, and indentation (and braces) can help see which is which.


I don't see how 'continue' would be confusing, it would stop the current calculation and start the next one, same as with normal loops.

The behaviour of 'break' might be confusing but then that's always the case in nested loops, no matter how you notate them.


I would argue that `break` intuitively makes you go back an indent, so if you don't indent after the first `for`, one can assume `break` just exits both loops (goes back from lvl 1 to lvl 0 indent)


This doesn't quite work, since break skips over if statements (and most breaks have at least one if statement around them). So you have to look glance at the keyword at the elbow of each indent anyways, to know where it's going.


While I understand that it's mostly a question of aesthetics, I don't like this also because it might break the expected semantics of continue/break in such a loop. When I iterate over things more complex than a single variable where the conditions in a for() would become ugly, I tend to put the domain-specific "increment" into a small static function:

    static int
    next_pixel(*x, *y, width, height) {
        (*x)++;
        ...
        if (last pixel...)
            return 0;
        return 1;
    }

    /* later on... */

    do {
        /* do something with coordinates x, y */
    } while (next_pixel(&x, &y, WIDTH, HEIGHT));
This is, of course, not only restricted to 2D arrays, but could cover walking of a tree or a filesystem...


I apologize for the stupidity of my question but shouldn't that be

    for(y=0; y<ymax; y++)
    {
    for(x=0; x<xmax; x++)
    {
        //...
    }
    }
?

Edit: Oh, I get it, {} are only "mandatory" if we have more than one instruction, the second for loop gets counted as one instruction and hence the omitting of {}. Thanks!


In C, control flow like if, for, while, do, else, etc., take the rest of the line of code (i.e., till the next semi-colon) and runs it under the conditions of the clause.

braces—{}—allow you to take multiple lines of code and have them be guarded at the same time rather than having to repeat the condition. GP is taking advantage of the fact that the next for clause will be considered a single statement, and therefore the second set of braces are not technically needed.

However, omitting braces for single-line snippets has caused plenty of havoc in the past (cf. Apple's gotofail). Some would say it is better to always include braces in control flow.


It works without the additional braces. Consider this:

    for (i=0;i<imax;i++)
        printf("%d\n", i);
You can omit the braces if the body of the loop is a single block, just like an if statement. You can also do this:

    if (a == b)
        if (b == c)
        {
            do_thing();
        }
Same deal with the for loop. The nested loop is a single block.


If I were doing it that way, I wouldn't give the outer blocks' curly braces their own lines. For example in C#, where pretty much every file has a single "namespace" block containing a single "class" block, I write it as:

    namespace SomeNamespace {
    class SomeClass
    {
        //...
    }}
Which avoids having shifted the whole file an extra indent level to the right. But I haven't seen anyone else doing this.


I'm not writing C# but yeah: AS3 and Java both have the Package{class{ two-step before you actually write any code. I don't even indent inside class:

  package{
  class{
  main{
    now we indent
  }
  }}


You don't indent after a package e.g.

https://github.com/apache/spark/blob/master/core/src/main/ja...

And it's fair enough to indent with a main since it is so infrequent. Ideally you only have one main entry point.


No, the two mean the same thing in C, e.g.:

    if (x)
        y;
Is the same as:

    if (x) {
        y;
    }


Just in case anyone who has never used C (or Java, or C# etc etc) is confused, it's not the indentation that is causing this meaning here:

    if (x) y;
also has the same meaning.


Why would it be? Certainly not for readability. I think it was posted as intended.


Ew! That's not any better at all...

Man's gotta have an opinion. At least C doesn't have optional semi-colons.


If I see this, I'll immediately think it's a bug, the difference between x and y is not even obvious at first sight, it looks like the line was duplicated accidentally.


What about this ?

    for(y=0; y<ymax; y++) {
    for(x=0; x<xmax; x++) {
        //...
    }
    }


I'd put the two loop-headers on the same line in that case.


I wish more languages would have what are essentially compiler enforced style guidelines (e.g. indentation, variable/function naming, file naming). So much time is wasted debating style, dealing with code review comments about style and making/using related tools.


I agree, although I think it should be done by a tool separate from the compiler. Purely because I'd have a stable compiler first and more code = more bugs.


PVS Studio support similar warning V640 http://www.viva64.com/en/d/0258/


I wonder if it'll be able to make sense of our macro-abusing XML-in-C: https://github.com/libguestfs/libguestfs/blob/master/src/lau... (Scroll up a couple of pages to see the definitions of the macros)


This would certainly be a useful feature, especially for those learning C, but I wonder whether checking stylistic things should really be in the compiler or part of a separate tool.


> I wonder whether checking stylistic things should really be in the compiler or part of a separate tool.

These checks absolutely should be part of the compiler (and this isn't a stylistic warning; it's a correctness warning). Our software ecosystem would have been in a far worse place if C compiler developers had decided to not support the -Wall switch in favor of lint(1). There's a lot more friction involved with using an external tool compared to just modifying CFLAGS.


The genie is out of the bottle already, gcc quite sensibly will warn you if it sees an assignment operator in an if statement. I'm sure it's saved countless man hours and pretty awful bugs since it was introduced :-)


Specifically, it doesn't warn you if the assignment statement is in parenthesis, so while ((xs = xs->tail)) doesn't warn. I really like how GCC handles that.


I just noticed this comment. OK, you've definitely got a good point. That is elegant.


I hope it doesn't warn for things like

if(0 != (i = getValue()))


It doesn't since the assignment is in brackets. You can also write

    if ((i = getValue()))


Why not be absolutely sure?

  if (0 != (0 != (i = getValue())))


I am aware of the redundancy, this was about the new warnings. If you can't do if(i = getValue()) without warning, I was wondering if assignments as a part of a larger expression would pass.


I hope it does!


Why would that be a warning? It's quite clear what that code is doing and is a reasonably common C idiom.


Yeah, fair enough. It's not bad - I had to think about my objection to it and it's probably largely unfounded.


the problem being refered to here is :

> if (i = 0)

instead of

> if (i == 0)


Any tool that does syntax checking of a c++ code (correctly) is rerunning the entire compilation process except for the generation of object code and optimization. Since build times are already an issue for C++ projects (even unoptimized), running the compilation process twice (or more) would not make developers happy. Also, I suspect that C++ devs would be philosophically opposed to spending clock cycles on recomputation ;).

Clang lets you write plugins (which at least solves the code reuse problem), however, these are currently strongly coupled to the compiler version itself. See for example, the chromium guide to writing clang plugins "Don't Write A Clang Plugin"[1]

https://chromium.googlesource.com/chromium/src/+/master/docs...


You don't need to do a full semantic analysis (e.g. type checking, const checking) for this kind of check, do you?At worst you'd need to expand macros and build an AST with information about white space preserved.


This doesn't represent a style check, but rather a likely indication of incorrect code. By contrast, the compiler won't warn for this code:

    if (condition) {
        foo();
    bar();
        baz(); }


Nice, maybe this will finally quiet the wrongheaded opinion that braces should always be used.


I do not think that the opinion of mandatory braces is wrong - in a language that does have braces.

If not for anything else, then for the fact that yi{ will _always_ yank the current block for me. Omitting braces will ruin this.


Seems like simply using astyle would be an improvement over catching this at compile-time.


The point of the warning is to find bugs, not enforce a particular coding style.

Running astyle would make it worse because you would hide the bug.


Having used astyle for exactly this, I simply have to disagree. Of course, you have to look at the code between running astyle and compiling it.


So whitespace matters huh?


No more "goto fail;"?


clang-format is better


And thus, every boneheaded project that ships with -Werror enabled will inflict an excrement tornado of FAIL on its grateful users and packagers.

THIS IS WHY YOU SHOULD NOT USE -Werror. NOT EVER. (And if you can't bring yourself to do anything about mere warnings, you shouldn't be in software engineering.)


From the article:

> This new warning isn't enabled by default.

And even if it were, it's a matter of adding -Wno-misleading-indentation to make builds green again. Not a big deal. And besides, it's likely that fixing those warnings could be done mechanically. (and it's hardly true that every project that ships with -Werror has such misleading indentation somewhere in its code)


And even if it were, it's a matter of adding -Wno-misleading-indentation to make builds green again

How does that help the person who is building the release you shipped six months ago with a new compiler?


You should have -Werror enabled on debug builds when developing and disabled on release builds.


Huh. I would think it to be the other way around. When debugging, let me comment stuff out without erroring due to an unused variable or an unused parameter. Let me use a C-style cast to check the bit representation of an object. Then when building the release build, enable -Werror to make sure that I'm not sweeping anything under the rug.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: