Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
This code smells of desperation (os2museum.com)
250 points by EamonnMR on Aug 6, 2023 | hide | past | favorite | 89 comments


> But the code in WIN87EM.DLL looks very much like the result of changes made in desperation until it worked somehow, even though the changes made little or no sense.

This is how the characters in Coding Machines realized something was up, assembly instructions involving carry bits that made no sense, that they later realized was how an AI writes code: https://www.teamten.com/lawrence/writings/coding-machines/

> It took us the rest of the afternoon to pick through the convoluted jump targets and decode four consecutive instructions. That snippet, it turns out, was finding the sign of an integer. Anyone else would have done a simple comparison and a jump to set the output register to -1, 0, or 1, but the four instructions were a mess of instructions that all either set the carry bit as a side-effect, or used it in an unorthodox way.


That reminds me of a case where an evolutionary-algorithm was being applied to FPGA circuits ("programmable" circuit layouts) with the goal of detecting the presence or absence of a particular tone. [0]

One of the results was a bizarre circuit that wasn't really digital anymore, because the pieces were arranged to exploit ways in which the digital circuit was imperfect, forming a system that was actually analog and idiosyncratic to the test environment.

[0] https://www.damninteresting.com/on-the-origin-of-circuits/


This sounds like the kind of thing where Raymond Chen would write up a historically completely sensible rationale for why that code is the way it is.


Most likely cuplrit: AutoCAD.

AutoCAD did all manner of nasty things with floating point numbers in order to stash extra data into them. Denormals, NaNs and the like were painfully common. You had to make sure your trap handlers were fast or AutoCAD performance would suck and everybody would slag your computer.

AutoCAD was one of the banes of existence for the FX!32 guys.


Did AutoCAD ever run inside Windows 3.xx? From my vague recollection, it was a DOS program in those days that was launched outside of Windows, or am missing something?



If git had existed, one would anticipate the commit log for the code would read like a Lovecraftian descent into madness as the coder makes increasingly unhinged pleas to the Great Old Ones to accept the unit tests.


Having read plenty of version control system logs from around the turn of the millennium (i.e. when things were _far_ less crazy than 1987), I'd wager most of it would come in the form of commits marked “implemented EGA driver, faster AI in Reversi, improve FPU exception code”, aka “I'm done for today, committing”.


Which makes sense, in a way. Most (all?) of the popular version control systems back then were centralized, and required the central server to even make a commit. Even Subversion, a "better CVS", required this. When you have to wait several seconds (or more) to make a commit, and you couldn't edit things to tidy up history after the fact, you tended to make commits much less often.

I kinda take git for granted now, where commits happen in a fraction of a second. Sometimes it's hard to remember what a pain it was using CVS, and even SVN.


Wish this was more common. I really want to build a collection of such "unhinged religious text" commits. Right now I only know of one example: the mpv locale commit.


For context, since I hadn't seen this before: https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

(advisory: _lots_ of profanity)


I didn't want to link it because if I did I'd feel the need to read the entire commit again. Which I just did.

It really is a religious text. It even granted me absolution. When I read it I realized I wasn't alone in thinking the C standard library is total garbage. Sometimes I'd post my opinions on the matter and people would argue with me but there's no arguing with that commit. I become increasingly convinced we should all just start over from scratch every time I read it.


Sometimes I think a "better C" is just C, but with a rational standard library. Then I remember that we're not allowed to have nice things.


I agree completely. I started writing freestanding C exclusively and it's a much better language compared to hosted C. There's zero cruft for you to deal with. It's just C and whatever you make of it. Nothing holding you back.

Some years ago I started a liblinux project so I could have lightweight Linux system calls but then I discovered the kernel's nolibc headers which are vastly superior. Now I'm working on a Lisp written completely in freestanding C with zero libc bullshit. Turns out you can get surprisingly far with a statically allocated array of bytes.


The only facilities I really need, 80% if the time, are FILE* — but I much prefer ‘fopencookie’ as the API: I can have a uniform “FILE*” API, and provide a default string-wrapper–to-user-allocator API. (This is when I’m writing library code.)


I vastly prefer using the open, close, read and write system calls. They are actually extremely easy to work with, easier than stdio functions. No buffering unless you explicitly add it: when you write data out, you know it's been written. Sizes are always known so you can actually implement proper memory and string structures in C literally from the ground up. They just return a negated errno constant to signal errors which means there's no brain damage like thread local global errno variables anywhere in your code. It's such a good feeling.

Buffering is the only real reason to build on top of those system calls. I wrote a very simple read buffer for my language's parser. It just sort of appeared organically during development. Haven't bothered to buffer writes.


Not knowing anything about x87 programming, but assuming the code is rational, I would guess this causes the code to work with a handful of _really_ broken or flakey FPUs. Or perhaps just one popular one.


I've written to him to ask why huge pages cannot be used by 99% of software targeting windows even though they speed up most software by 10-20% but he hasn't written anything bout it yet :(


What do you mean “cannot be used”? Applications have to be coded specifically for huge/large pages. It’s similar on Linux, only certain applications support huge pages.


For example applications like Microsoft Word or World of Warcraft cannot use huge pages because users won't mess with security policies then reboot the machine and immediately start the application as administrator and never close it, and these are the steps required of the end user for an application to use huge pages on Windows.


Are you positive that those applications even have large-page support?


It doesn't make much sense for developers to spend time adding features that users cannot use.


If there was one, he would have written it by now.


It's been a few years since it seems he ran out of interesting historic things to tell.


The Microsoft code leak mentioned by one of the comments has been out there for years so might as well paste it here so cut down on some of the speculation? Fair use - commercial value is zero, historical value for analysis and criticism is high.

The relevant code comments seems to be

"Fix timing problem??"

and

"486 bug - must wait till after last "out f0" to clear fp exceptions or IGNNE# will be permanently active."

    public __fpIRQ13
    __fpIRQ13:
     cli
    
     WASTE_TIME  70
    
     push    ax
     xor     al, al
     NULL_JMP
     out     0f0h, al        ; reset busy line.
     NULL_JMP
     mov     al, 65h
     NULL_JMP
     out     0a0h, al        ; EOI slave irq 5
     NULL_JMP
     mov     al, 62h
     NULL_JMP
     out     20h, al         ; EOI master irq 2
     NULL_JMP
     pop     ax
    
    
     sub     sp, 2
    
     push    bp
     mov     bp, sp
    
     fnstsw  [bp+2]
     WASTE_TIME
     push    ax
     xor     al, al
     NULL_JMP
     out     0f0h, al        ; reset busy line.
     NULL_JMP
     pop     ax
    
     pop     bp
    
    ;       fnclex                  ; 486 bug - must wait till after last
        ; "out f0" to clear fp exceptions
        ; or IGNNE# will be permanently active.
     WASTE_TIME
     push    ax
     xor     al, al
     NULL_JMP
     out     0f0h, al        ; reset busy line.
     NULL_JMP
     pop     ax
    
    ;       fnclex                  ; 486 bug - must wait till after last
        ; "out f0" to clear fp exceptions
        ; or IGNNE# will be permanently active.
     WASTE_TIME
     push    ax
     xor     al, al
     NULL_JMP
     out     0f0h, al        ; reset busy line.
     NULL_JMP
     pop     ax
    
     fnclex                  ;Now this is safe.
     WASTE_TIME 70           ;Fix timing problem??
    
     jmp     __FPEXCEPTION87P


Really interesting that it was a 486 bug, given the provenance listed in the article. Windows 3.0 was, indeed, released after the 80486 was. I am not sure why the reset busy code was repeated 3 times, I assume the bit must have been somewhat sticky.

"If an unmasked exception occurs when the numeric exception bit in CR0 is clear and the IGNNE# pin is active, the performance of the FPU will be retarded as long as the exception remains pending."

https://www.cs.earlham.edu/~dusko/cs63/prepentium.html

I wonder if that has anything to do with it all.


The only way to have more fun than abstracting broken software is abstracting broken hardware.

I can imagine somebody spent months on those few lines of assembly.


I suppose it depends how the hardware was broken.

If you just needed a delay, this is bad code thats just been randomly iterated until it 'works'.

On the other hand, if the hardware does require such an incantation then it's impressive that someone managed to wade through the brokenness.

I'm inclined to believe it's the former though.


it looks suspicious to me, like the kind of thing you find on page 30 of a processor errata document, something like "single writes to external device fail 0.5% of the time due to a register clearing bug on mask revisions prior to version 23. recommended workaround: write twice."


I can tell from this post that you probably have had the chance of never having to work with or emulate broken hardware (which is to say every pieces of hardware ever). At some point you just stop trying to be sane and just go with what works.


Otoh, these operations are too specific to come up with by random iteration. I believe it was some hardware nonsense that was both arcane and avoided by random iteration.


Is it too specific or is it Adams' sentient puddle?

I've done similar where you misunderstand something and what you write doesn't work, you strip it down to a minimum working version add on bits which break it, fiddle about with iterating the new bits and afterwards you have an accretion that 'works' but you don't know why.


The delays introduced by the repeated PUSH/POPs would be quite short, even on the 8088. How would you propose such making high-precision waits for the external x87 chip? (Assuming they were needed at all.)


if it was just 100 push pops, that would be fine for a delay.

if its 20 push pops foo 10 push pops bar foo 20 push pops foo foo 10 push pops.

to achieve the same delay.

its like: a=1 b=2 result=0 print 1+2 return

it isnt wrong, but its indicative of someone who doesnt understand whats going on. you wouldnt describe it as good code.


A non-obvious reason for using PUSH/POP instead of a loop could be to generate as many different bus cycles as possible.

We get instruction fetch, memory read, memory write, and probably idle -- and an I/O write (0F0h). We should get the idle cycles because PUSH/POP instructions are single-byte and require a little time for decoding and the 286 BIU fetched 2 bytes at a time so the instruction buffer should get full.

Maybe they wanted that on top of the delay (which is obviously there because they didn't want to use the FWAIT instruction) -- maybe some 80287 chips had internal state machines that could get confused and needed some help?


It /could/ be that.

But I think that's an overly charitable interpretation based on the evidence.

If the source code turns up with detailed comments about why it's like it is. Fine. In the absence of that, I'm not buying it.


I also don't think it is. I think it is just a short and simple delay -- each push/pop pair is only two bytes AND generates 2 word transfers AND doesn't disturb any registers. They probably had a macro or repm loop for it.


But why the redundant FNCLEX and io writes then?

As I said, if you just had 100 push pops, fine. But it isnt, it's push pops mixed in with other seemingly redundant code.


This is what you’d run across in codebases before the internet, let alone Stack Overflow.

People didn’t have code to copy and paste — so they randomly wrote it like monkeys until it worked based their understanding of one page of a manual, which was literally the only documentation or description anywhere of how the system they were working with worked.

Source: I was there :)


Add to the mix bug reports like, "This worked on the Gateway when the Epson was freshly plugged into the LPR port but crashed after the Epson had printed 5 pages. If we remove our sound card, then no more problems..." Microsoft's strategy was to support legacy and buggy hardware -- this reduced friction for OEMs and helped expand the market, but it also caused a lot of trouble.


FPUs in the early x86 family are weird. They were typically on separate chips so you could have an 8088+8087, 80286+287, 80286+287XL (which was actually a 80387), 80386+387 (SX and DX models for 24 or 32 bit bus), 80386+287[1], 80386 or 486+Weitek[2], 80386+Weitek+387, 80486SX+80487 where the co-processor was a full CPU that disabled the main chip. And then there were the clones doing creative things such as the Nx586+587[3] which because of it's lack of on-board FPU was often confused for a 386 by software and lost the advantage of its Pentium ops.

So I'm not surprised the exception handler is a mess. It's a domain built entirely out of corner-cases.

[1] https://old.reddit.com/r/retrobattlestations/comments/hj12ck...

[2] https://micro.magnet.fsu.edu/optics/olympusmicd/galleries/ch...

[3] https://en.wikipedia.org/wiki/NexGen


A friend and I each bought 387's (which was physically, a separate chip) for our 386's circa 1992. IIRC, I had a 80386/25 MHz with 4 MB of ram.

I remember a tank game called Scorched Earth where you would have to set angle & power to try to hit the other person's tank. Some ordinances took a 10-15 seconds to fire & complete because it was running FP ops on the CPU. Once the 387 was installed, this calculation was done almost instantly. That's about all I remember my FPU being good for. LOL good times!


The funky bomb or death's head? Especially when it was a large map with lots of ground to destroy.

Me and my siblings had a house rule to not use either when playing on our 286 because it took a minute or so to complete...


It is clearly written to not use the (F)WAIT instruction -- the "dumb" code is there to make sure the previous 80287 instruction has completed.

The first time wasting code is long because it has to be slower than the slowest 287 instruction takes to complete after signaling an error. The other time wasters are shorter because they come after known instructions that are faster (FNSTSW just stores 2 bytes to memory, FNCLEX clears some bits inside the 287). Note also that they are the FNSTSW and FNCLEX -- that means there is no implicit (F)WAIT instruction before the real 287 instruction.

Why two FNCLEX? I don't know.

Why 4 writes to port F0? Probably in case the FNSTSW and FNCLEX instructions lead to errors.


> Why two FNCLEX?

There is a behavior on some CPUs where "out 0xf0" can leave IGNNE# active, but you can clear it after the "out" by running "fnclex".

Why are there two of them? Either the "out 0xf0" is affected by IGNNE# being active, or maybe the original draft had one "spin, out, fnclex" and that whole block of code was just copy+pasted when they added the second one.


You should write that answer as a comment to the blog post. The author of the blog is very thorough and likely to take an interest to it, if there’s anything to it.

(As an aside, why are we assuming 80287 and not 8087? I know nothing about both, so it’s well likely that I missed obvious hints. EDIT: Ah, I guess because it’s the int 13 handler specifically.)


I did. Stuck in moderation. Correct, int 13h.


Wasn’t 13h disk services? Guess they got shared? Or is it hardware interrupt 13h mapped somewhere else through the interrupt controller?


It was actually IRQ13 -- one of the input pins on the slave interrupt controller. It was typically mapped to INT 75h. I don't know if Windows remapped it.


Should be int 16h I think.


Int 16h was the BIOS keyboard services.


Right, thanks.


That was my first thought too - they were trying to synchronise the CPU and FPU.

The mention of not using the wait instruction reminded me of this other post on the same site: https://www.os2museum.com/wp/learn-something-old-every-day-p...


How does an FPU get out of sync with a CPU? Wouldn't the CPU automatically wait for the FPU logic to complete (just like with every other instruction)?


This is the Intel recommended exception handler:

• Store the NPX environment (control, status, and tag words, operand and instruction pointers) as it existed at the time of the exception.

• Clear the exception bits in the status word.

• Enable interrupts on the CPU.

• Identify the exception by examining the status and control words in the save environment.

• Take some system-dependent action to rectify the exception.

• Return to the interrupted program and resume normal execution.

and one of the examples for writing the handler is this:

   SAVE_ENVIRONMENT PROC
   ;
   ; SAVE CPU REGISTERS, ALLOCATE STACK SPACE FOR 80287 ENVIRONMENT 
     PUSH BP

     MOV BP,SP
     SUB SP, 14 
   ; SAVE ENVIRONMENT, WAIT FOR COMPLETION,
   ; ENABLE CPU INTERRUPTS
     FNSTENV [BP-14]
     FWAIT
     STI
   ;
   ; APPLICATION EXCEPTION-HANDLING CODE GOES HERE
   ;
   ; CLEAR EXCEPTION FLAGS IN STATUS WORD 
   ; RESTORE MODIFIED 
   ; ENVIRONMENT IMAGE
     MOV BYTE PTR [BP-12] , 0H
     FLDENV [BP-14] 
   ; DE-ALLOCATE STACK SPACE, RESTORE CPU REGISTERS
     MOV SP,BP
     POP BP
   ;
   ; RETURN TO INTERRUPTED CALCULATION
     IRET 
   SAVE_ENVIRONMENT ENDP
Make of that as you like. I am somewhat curious about what happened here.


The 8087 and 8086/8088 (and 80287/80286) perform a complicated dance to cooperate on the execution of FPU instructions. The (2)87 has an output pin that is active when the FPU is busy. It is connected to an input pin on the CPU and the WAIT/FWAIT instruction waits until that pin isn't active anymore.

The 80386 with the 80287 or 80387 did the same thing.

The dance also involves the FPU snooping on the address and data buses (multiplexed on the same pins on the 8086/8088). The FPU can see which memory reads are instruction fetches + the CPU helpfully tells the world what it does with the instruction prefetch queue (it gets flushed sometimes). The FPU maintains its own copy of the prefetch queue so it sees the same instruction bytes as the CPU and in the same order. The CPU also helpfully says when it gets the first byte from the queue. This is how the FPU knows when to look for an ESC opcode (all the 80(2)87 opcodes are ESC opcodes -- the CPU doesn't know what they are). If the ESC opcode has a memory operand, the CPU performs a dummy read. The FPU snoops on that -- and sees what address the CPU uses. If the FPU instruction involves reading from memory, it also stores whatever data the memory returns. If the FPU needs to write to memory or it needs to read more memory, it will request bus access from the CPU. When it gets it, it will issue the memory requests it needs and then relinquish the bus again.

The 486 had the FPU integrated and no longer had to do any of that. This also opened up the possibility of fp comparison instructions that set the CPU flags directly -- prior to that, you would execute an fp comparison instruction on the FPU, write the FPU status flags to memory with another fp instruction, read them back to the AX register with a CPU instruction, use the SAHF instruction to write them to the CPU flags, and only then do a conditional branch. The Pentium Pro introduced the FCOMI/FCOMIP/FUCOMI/FUCOMIP instructions that short-circuited that: it's just the FCOMI/... instruction directly followed by the conditional branch. Much better.


Nope. That's why the FWAIT instruction exists.


Somewhere there is a production codebase containing a particular sequence of check-ins that reflect the peak of my similar flailings.

I am not proud of my desperation, but I can acknowledge it now.


"This time for sure!"


those opening "wtf" sequences might be there as filler space; harmless instructions with a known pattern where you can come back later and insert different instructions. Most people use NOPs for that but perhaps they wanted a different signature or needed 3 separate, differentiated patch points at entry. Or maybe they wanted to help sell more 8087 chips.

Anybody recall if there was a notable performance difference between Borland's FP emulation lib and M$, then? My habit at the time was to religiously avoid all floats, to the point of shipping a home made arbitrary precision BCD math library. It was no faster than anything else but it gave the same results for the same inputs, every time on every machine.


I've inherited a similar bit of code that kicks in right after pivot (of Linux boot) and tries to disassemble and clean up whatever storage was concocted by the previous steps during boot, and then proceeds to assemble it using some user-supplied layout.

The code is awful, but, really, if anyone's to blame, it's the Linux people who never cared to systematize and unify system's understanding and representation of storage.


Got rabbit holed... I love this ad - https://www.os2museum.com/wp/os2-history/os2-beginnings/1987... - it is a sort of weird mixture of Steve Job's Apple smooth talking and desperate street seller at the same time.


I'm not nearly expert enough to judge, but to me it smells like heavy wizardry.


"Desperation" or random iterations until it passed every test. It doesn't seem to have a lot of opcodes. How much time did it take to find the algorithm with the processing speed of their time?


Somewhat off topic, but your network switches don't still come with metal cases? I get the cheapest stuff that's likely to be reasonably good quality and they all have metal cases.


This triggered my PTSD haha


The removal of "This..." from the title here really confuses it.

With "This", it's obvious the title is "(This code) smells of desperation". The submitted title is ambiguous; it could mean "(Code smells) of desperation".


The submitted title might very well have been "This code smells of desperation.". HN has some strange rules that edit submitted titles like stripping "10" or "How to"


It was already annoying that titles can be changed by a moderator without leaving any trace and now titles are also being mutilated automatically at submission time into something that may or may not make sense. At least in this case the submitter may notice what happened and change it back.


That has certainly made finding something in my history difficult more than a few times.


Well for what it’s worth this certainly isn’t new


Recency is relative.


They would seem less strange if you saw a list of the baity titles that get de-baited that way. Maybe we should publish that.

"How to" doesn't get edited out. Certain other leading hows do.


I feel like it’d definitely help to know what the rules are. Might make it better to editorialize slightly and avoid having it happen automatically


This comment reminds me of those Amazon reviews that give a product 1-star because Amazon had a shipping issue. Yeah, sorry the product wasn't able to get to you but you aren't helping me figure if the product itself is any good or not.


Like the time I ordered some slate coasters but received an envelope of slate chips and fine powder. I was not able to leave a review that mentioned that the seller just wrapped the coasters in a paper towel and tossed the coasters in a sturdy envelope.


I clicked expecting a joke article about Code Smells, I was a little disappointed


Thank you both for saving me early morning disappointment.


It was a good read nonetheless!


os2museum is a good read always.


Code Smell is an industry accepted term. I am unclear how any other interpretation of the modified title could be expected.


the "smells" in the industry term "code smells" can function as both a verb and a noun.

As a noun, it can describe specific ways that hypothetical code might not follow best practices. For instance, code fragments that have been copy-pasted many times rather than refactored into a function, is a code smell. The use of many global variables is a code smell. Together, these are "code smells".

As a verb, it describes specific code which exhibits these sorts of attributes. A particular source file can smell. The code... smells. The phrase can also be used adjectively, to say that code is smelly.

The title "Code smells of desperation" could imply the noun form, in which an article discusses various code smells which could be a general indication that a hypothetical code base might be in desperate shape. Or that the team maintaining it is. It is an article about smells, in code.

Whereas, "this code smells of desperation" uses the verb form to indicate that the article is about a particular code base which appears to be in desperate shape, because of the smells it specifically gives off. It is an article about code, which smells.


Well, interpreting the title in that way is incorrect, so it seems like GP kind of has a point then.


You just proved OP's point since you misunderstood what the original title was.


Alternative interpretation based on the mangled title: here are things to look for in any code base which indicate the programmer was desperate.


All code smells of desperation.


That's how I interpreted the title.


>> All code smells of desperation.

> That's how I interpreted the title.

That's my point - even that is ambiguous.

Do you mean, "There is no code that doesn't smell of desperation" or "Of all the code that exists, herein lies the complete subset of it that smells of desperation"

eg:

(All code) smells of desperation vs. All (code smells) of desperation.


I was also slightly disappointed not to find a discussion of <code smells>, but the post is interesting, and we can still discuss code smells here.

The post author (Michal Necasek) states, about the WIN87EM.DLL code:

> It bears all the hallmarks of code that was written, rewritten, rewritten again, hacked, tweaked, modified, and eventually beaten into submission even if the author(s) had no real idea why it finally worked.

From what I gather, here are those hallmarks:

- Looping a no-op action, presumably to slow things down.

- Unnecessarily performing actions multiple times. This happens for three things: (a) writing a zero to an I/O port to clear something; (b) executing an instruction to clear exceptions; and (c) repeating the aforementioned no-op loop at different points.

- Saving a status in a separate location, only to reinstate it to its original location after clearing things out.

- Communicating procedure state (an EOI, “end of interrupt”) to one entity (the master interrupt controller) but not another (the controller’s slave). Furthermore, this “end” signal was sent near the beginning of the procedure. (This final point is my own observation and not explicitly called out by the author. Perhaps it’s common and not “smelly” for interrupt handlers to do this up front.)

I’ve tried to reframe the technical terms as actions and signals in a way that could be recognizable to devs of higher-level systems. My familiarity with OS-level systems is minimal so my interpretations could be a little wrong.

But despite my lack of knowledge, and with the author’s help, it does seem clear that there were serious timing and state related bugs here. And as a dev at other levels of the stack, I can relate: it’s very hard to reason about async global state! And this code’s responsibility was handling math errors, not timing errors. It is - or, perhaps, should be - the responsibility of the OS to orchestrate these things appropriately so that math libraries can focus on math stuff.

So my takeaways, for “code smells of desperation”, would be:

- There are violations of module responsibility.

- There are modifications of process timing with no discernible reason.

- There are modifications of status/environment/state with no discernible reason.

- And finally, other experts (in this case, the post author) can’t make sense of the code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: