How to avoid the assignment statement

cperciva · on July 12, 2010

A C programmer's reparsing:

"Initialize right away" --> "Make it hard to figure out what your variables are by mixing up code with their declarations."

"Construct new values" --> "Manually do what the first pass of the compiler already does. Except don't use the const keyword."

"Make functions, not procedures" --> "Don't check for errors. Don't even make it possible to check for errors."

"Make your objects immutable" --> "Let's see how much time we can spend in malloc!"

"Use purely functional data structures" --> "Arrays and hash tables are often the optimally performant data structures... but you shouldn't use them."

"When you can't help it" --> "If all your other attempts to sabotage your code fail to provide the required performance degradation, you can always slow your program down by copying data around unnecessarily."

In all seriousness, just like the famous "GOTO considered harmful", the author's "assignment statement is harmful" is an assertion which is valid in some situations -- quite possibly most or all of the situations he personally has to deal with -- but most definitely not valid in all circumstances.

mahmud · on July 12, 2010

In his book "Zen of Assembly Language", Michael Abrash has a chapter named "Billy, don't be a compiler" and tells the story of a man who coded Assembly like a C compiler.

I apologize in advance, but I can't help but extend to you the same advice, cperciva.

"Make it hard to figure out what your variables are by mixing up code with their declarations."

In the right language, binding of variables is usually done by a LET construct, and within its scope where the variable is visible, the code is indented in ward. When you nest LETS you can see the liveness of a variable.

"Construct new values" --> Manually do what the first pass of the compiler already does. Except don't use the const keyword

He misspoke. He meant construct new variables. I am sure you would agree with him that a 'promiscuous tmp' is a royal pain in the ass when you're stepping through bad C code.

"Make functions, not procedures" --> "Don't check for errors. Don't even make it possible to check for errors."

Again, you're thinking at an abnormally low level of abstraction. Results are not for error handling, that's what the condition system is for (some erroneously call it an exception handling system)

"Make your objects immutable" --> "Let's see how much time we can spend in malloc!"

That's what the GC is for. When you make objects immutable you're buying yourself the piece of mind that comes with referential transparency.

"Use purely functional data structures" --> "Arrays and hash tables are often the optimally performant data structures... but you shouldn't use them."

There are ways to implement functional arrays and hashtables, but none too apparent without getting into academic wankery. I agree on this point.

philwelch · on July 12, 2010

Imperative exception handlers, like in C++, are a lot like glorified longjumps. When learning Erlang I was pleased that a more functional language can handle exceptions much more neatly--if a given expression causes an exception, the value of that expression is the exception, and that recurses in a certain way. It's conceptually less like exception handling in an imperative language and much more like an automatically mediated means of returning error codes.

on July 12, 2010

[deleted]

loup-vaillant · on July 12, 2010

Language specific? Could you specify the languages you have in mind? I used a C/C++/Java syntax, but I meant any language. Or at least the ones with garbage collection.

mfukar · on July 12, 2010

I will actually make a blog post of it. There are a few more things to be said, and too little space here.

loup-vaillant · on July 12, 2010

Great. Please send me an e-mail when you're done.

btmorex · on July 12, 2010

I agree with most of what you said, but:

"Initialize right away" --> "Make it hard to figure out what your variables are by mixing up code with their declarations."

No. Putting all declarations at the top of a function makes it hard to figure out. If you can use C99, you should keep declarations and initializations together and hopefully as close to where they are actually used as possible.

One thing I'd add about the article: naming a function "move" and then having it return the sum of two vectors is bizarre and worse than the original implementation. At least, move as part of the Point class actually moved the point.

cperciva · on July 12, 2010

In a 50 line function with 5 variables, I find it much easier to look at 5 lines of variable declarations to figure out how I declared a variable than to scan through up to 50 lines of code.

Qz · on July 12, 2010

Stop writing 50 line functions.

loup-vaillant · on July 12, 2010

So, declaring variables at the top of the function makes it easy to find them. Now how do you find the point of initialization?

    declare x    |
    ...          |  ...
    initialize x |  declare & initialize x
    ...          |  ...
    use x        |  use x

In both cases, you will have to scan for the whole function, or hit the "search" hotkey. Or maybe sometimes you just need to know when `x` were declared, but not what value it holds?

mahmud · on July 12, 2010

One thing I'd add about the article: naming a function "move" and then having it return the sum of two vectors is bizarre and worse than the original implementation. At least, move as part of the Point class actually moved the point.

This makes sense in MVC where the controller is responsible for updating point coordinates and the view is responsible the actual rendering of the points on the display.

Also, in most graphic systems, the screen refresh is done by a single call made from within the interaction loop or by a designated callback. That's because all display updates are made to a back buffer, which then gets written to the framebuffer at a regular clock-driven interval.

loup-vaillant · on July 12, 2010

> naming a function "move" and then having it return the sum of two vectors is bizarre and worse than the original implementation.

I kept the name for the sake of consistency. I agree that's a mistake. The problem is that a `Point` should be moved with a `Vector`, and not another point. But introducing a `Vector` class would have lengthen my tutorial. Maybe I should use two `floats` instead of a `Point`.

th · on July 12, 2010

That example bothered me as well mostly because when the move function is placed outside of the class it is now performing a different action. The new function no longer really moves a point so much as it adds two points (regardless of the point/vector issue).

I think many of the naming conventions commonly used in OOP for mutable objects are not as appropriate for their immutable counterparts.

tkahn6 · on July 12, 2010

Agree.

When I was going through K&R I was always thinking to myself, "did they just know all of the variables they would need before hand?".

mfukar · on July 12, 2010

Maybe not. Probably not, in many cases.

Hey, have you seen function prologues in assembly? How does it know all the amount of stack space it's going to need beforehand?

philwelch · on July 12, 2010

Error checking can be done in a function-composing style if your functions check for error on their inputs, recognize error-indicating values on their parameters, and consistently output error-indicating values.

There is definitely waste involved in this--when composing functions this way, you have extra function calls and error-code checking on your parameters rather than checking data along the way and bailing out if there's a problem. Your functions have to check the arguments rather than having their preconditions enforced outside of them, which makes the composed functions less elegant. I don't know how practical it is in the end, but pushing most of the error-checking complexity into lower level functions and writing your upper level code in a compositional style sure seemed elegant every time I tried it.

cperciva · on July 12, 2010

Error checking can be done in a function-composing style if your functions check for error on their inputs, recognize error-indicating values on their parameters, and consistently output error-indicating values.

True -- but this is only possible if you accept layering violations. To take an extreme example, how is printf() supposed to know that if a string parameter is NULL then it should error out without printing anything?

loup-vaillant · on July 12, 2010

Your example is wrong. Functions parameters should be guaranteed to be error-free in the first place. If you decided that a NULL string was an error condition, you should test for that before calling printf.

You should test return values, not parameters.

philwelch · on July 12, 2010

If you have to check your return values yet you can't check your arguments, how do you compose functions? foo(bar(x)) requires foo to check its argument, since bar(x) could return an error code and foo() needs to know how to deal with it.

You don't have to check all possible instances of spurious arguments--you should be allowed to specify some preconditions--but you definitely have to check for error codes!

Example: Suppose we're writing a basic filesystem. We have the following primitive functions:

unsigned long getino(char pathname): For a given designated device and path, get the inode number of the designated file. Return 0 (there is no inode number 0) in case of error.

MINODE iget(unsigned long ino): For a given designated inode number, return a pointer to a data structure in memory containing the INODE data; allocated if necessary. Return NULL in case of error.

You can either test your return values from these functions, in which case you will do this:

  unsigned long myIno = getino(path);
  //error check myIno
  MINODE * myMINODE = iget(myIno);
  //error check myMINODE

or put some argument checking in iget() and do this:

  MINODE * myMINODE = iget(getino(path));
  //error check myMINODE

How much argument checking has to go into iget()? Exactly this much:

  if (!ino) return NULL

It seemed like a neat idea at the time, and considering I called that particular composition of functions a dozen times in that project, there was a lot less repeating myself in teaching my functions to understand error codes.

cperciva · on July 12, 2010

  unsigned long getino(char * pathname):

  For a given designated device and path, get the
  inode number of the designated file. Return 0
  (there is no inode number 0) in case of error.

Completely off-topic, but there is a bug in this prototype: Inode numbers have type ino_t, not type unsigned long.

(Ok, two bugs, if you count the fact that there can theoretically be an inode number 0. And three bugs if you count the fact that pathname should be declared as a (const char *). But the ino_t vs. unsigned long bug is really bad if you care about portability.)

philwelch · on July 12, 2010

You're right on all counts, and if I were writing a real filesystem rather than a class assignment I would have taken these issues more seriously.

(edited to add paragraph:) Nothing I've said in this discussion should be taken as a strongly held opinion about anything--I'm rather shy about sharing these thoughts in the first place, but considered this particular example illustrative of a possibility I take seriously.

You do not want to see the other horrors that particular professor of mine has perpetrated, incidentally.

loup-vaillant · on July 12, 2010

In general, I prefer not to have to check for arguments. In your case, the right™ way to do it is probably monads: they help you separate the error checking from the "normal" path.

But that's C code, so I have to agree with you.

cperciva · on July 12, 2010

You're disagreeing with the wrong person here.

loup-vaillant · on July 12, 2010

I disagreed with your "extreme example", on the grounds that checking arguments instead of return values is a bad idea.

cperciva · on July 12, 2010

Yes, I know. I also disagree with philwelch's suggestion of checking arguments -- that's why I presented that extreme example.

loup-vaillant · on July 12, 2010

OK, let's pretend I was tired. I was disagreeing with the wrong comment…

philwelch · on July 12, 2010

The easiest way is to treat printf() as a procedure rather than a function, and to restrict your use of compositional style to cases where it's still a good idea.

nostrademons · on July 12, 2010

You've just invented monads. ;-)

loup-vaillant · on July 12, 2010

Thank you for your feedback, I will amend my entry.

> "Make functions, not procedures" --> "Don't check for errors. Don't even make it possible to check for errors."

I will make my tutorial clearer on this point: errors can be embeded in results. Option types can do wonders. Of course, this is not easy to do in C.

My opinion is that programming in C means either dealing with legacy code, seeking utmost performance, or being nuts. I think that leaves plenty of room for reasonable programming. I never said that my assertion is valid in all circumstances. Actually I agree with you here. But I would be extremely surprised if my advice couldn't be followed by most programmers, most of the time.

mfukar · on July 12, 2010

"Of course, this is not easy to do in C."

Actually, it's quite simple. The issue is that debugging becomes a bitch, and error codes are hidden even from the developer inside nested function calls - ie. you can't stop/recover from errors right away.

ajuc · on July 12, 2010

> Make functions, not procedures

I can't agree - you don't need results of functions for checking errors - in many higher level languages you can return multiple values, you can use exceptions, even in C++ without exceptions you can do this by in-out parameters, for example:

http://www.nopaste.pl/rtt

I think it's cleaner than making nested if then else-s over every invocation of your functions.

zweinz · on July 12, 2010

Seems to me like you should just be coding in a functional language. Coding in Scheme might be easier than turning C into Scheme.

loup-vaillant · on July 12, 2010

I agree. By the few code that I write[1] outside of my day job is not in C. Also, I didn't meant C, but any language with garbage collection. I used C syntax mainly to reach mainstream imperative programers.

[1]: http://www.loup-vaillant.fr/projects/ussm/

10ren · on July 12, 2010

A useful summary (eg. I found my experimental code from last year a lot easier to understand when it used functions rather than procedures) - though I couldn't help laughing when he started saying "mistakes", "correct" and "incorrect".

loup-vaillant · on July 12, 2010

Ah… Do you think I should change my wording? Because if it made you laugh, it could also discredit my point in the eyes of those who don't agree with me in the first place.

I'd like to correct that, if possible.

10ren · on July 12, 2010

The wording comes across as dogmatic; yet the article isn't actually dogmatic - it sincerely acknowledges cases where the functional approach isn't suitable (even including the really cool algorithm of quicksort - BTW you might mention games as another one where mutable state is convenient). Showing respect for alternatives (where due) makes an article seem objective, impartial, intellectual and truth-seeking (rather than pushing an agenda; or trying to sell something). It would probably be beneficial if the wording reflected your intent.

I did find it distracting, because (to be frank) people don't like to be told what they should be doing; laughter is a way to defuse the affront. Another approach is to appeal to those (few) people who are keen to learn about techniques that will help them. And I really think that that's what you were going for. Changing the wording would help achieve that aim, in my opinion.

So... yes.

btw: I really liked your explanation of linked lists being cheap to modify; and the article in general is very well done and communicates very effectively - I'd say it's of textbook standard.