My last name makes me invisible to Computers (2015)

DavidWoof · on Aug 18, 2017

I've been programming for over a quarter century, and I feel pretty confident that I've never written anything that could confuse 'null' with NULL. I can't even think of a language that would let you easily do this.

If web forms aren't accepting NULL, then somebody probably specifically programmed the word 'null' into a filter of disallowed entries. Probably to stop clerks from entering the word 'null' to mean empty string. This has nothing to do with null being a reserved word in many languages, I'll bet the forms that aren't accepting 'null' aren't accepting 'none' or 'empty' either.

ajuc · on Aug 18, 2017

In java + operator on strings works like this:

    Object a = null;
    String s = "foo bar " + a; // s == "foo bar null"

So, it's possible to get "null" as a string downstream, when some variable that should be non-nulleable - was null. If you find such a bug and incompetently fix it by checking for "null" downstream instead of checking before turning variables into strings - you have the error from the article. Especially if you check ignoring the case.

BTW guess how I know it's working like that :)

ufmace · on Aug 19, 2017

That's insane.

I've known Java to be a bit verbose, but I thought it was mostly a reasonable language aside from that. I might have expected something like that from Javascript or PHP. Never in my wildest dreams would I have thought that boring old Java would interpret a null cast to a string as a literal string "null" when combining strings. Even Ruby and Python don't do that - they throw type mismatch errors instead. C# treats it as empty string.

DSMan195276 · on Aug 19, 2017

Ehh, I don't really think this is a huge deal personally. Keep in mind that it is not that `(string)null` gives you the "null" string, or `((string)null).ToString()` gives you the null string. You have to use actual string concatenation to get it (Or as others have pointed out, `String.ToValue(null)`). Point being, it isn't/shouldn't be messing up your regular comparisons (Unless you turn it into a string beforehand), and you really shouldn't be checking for stuff like `null` against a string literal anyway - it's meant for displaying a human readable string, not an unambiguous string that could be converted back at a later time. If you wanted that then you should use proper serialization techniques rather then just concatenating a bunch of values.

ufmace · on Aug 19, 2017

It's not that it's really that huge of a deal, just that it's so out of character for Java and everything it is seen as.

It's like if a guy you were friends with since college and always knew to be solid and reliable but a little boring, and now many years later, he's happily married with kids, watching sports, working in a large hierarchical organization, doing pretty average stuff but nothing unusual or exciting. If you suddenly found out that guy had secretly been a Furry the whole time, that would be really shocking.

Not that there's anything wrong with being boring and reliable, or being a furry. But the sudden change in how you saw somebody or something is stunning.

I half expect anything written in Java to be littered with AbstractFactoryFactories and giant frameworks using 20 different design patterns to write Hello World. I never would have expected Java to silently convert an actual null to the string "null". I always thought the one thing you can count on Java for was to be strictly strongly typed, and never silently do any weird random conversions that nobody would have expected. Guess I was wrong.

Though poking around in a few other languages, JavaScript does indeed do that also, though I kinda expect JavaScript to do things like that. In Ruby, nil.to_s gives empty string, and adding a nil to a string gives you a type error. In Python, trying to add None to a string also gives you a type error, but str(None) does give you 'None'. That's a bit disappointing, but not shocking to me.

Zak · on Aug 19, 2017

It's insane in that Java pretends to have strong, static typing, then does things like that.

Spivak · on Aug 20, 2017

Overloading the plus operator for strings was probably a mistake in the language but that's hardly a condemnation of their type system.

Zak · on Aug 20, 2017

A lot of languages overload + to mean concatenate. I think that's a mistake, however...

In Python:

    "abc" + None
    TypeError: cannot concatenate 'str' and 'NoneType' objects
    
    "abc".join(None)
    TypeError: can only join an iterable

In Ruby

    "abc" + nil
    TypeError: no implicit conversion of nil into String

In Lua

    = "abc" .. Nil
    attempt to concatenate global 'Nil' (a nil value)

But not Java. Java implicitly tries to coerce the provided value to be a string, which seems out of place in a language that values type safety. That all types might also be null also seems out of place in a language with strong, static typing, but that's been discussed to death. The problem is compounded by the fact that null is actually converted to "null" instead of an empty string.

Too · on Aug 20, 2017

On the other hand, it's easier to get into this situation in Python than Java.

    bar = "bar" 
    foo = "foo"
    foo_bar = foo + baar   # intentional typo (baar will be None)

In Java this would be a compiler error, but python accepts it until the program is being run.

Zak · on Aug 20, 2017

That's a NameError in Python: undefined variables don't have a value, not even None. It won't be caught until runtime since there's no AOT compiler in Python, but it's the same kind of error it is in Java.

My claim isn't that Python is safer overall than Java. Instead, it's that Java, a language that is mostly type safe, most of the time should not have these two potentially surprising behaviors:

1. The standard string concatenation operator does implicit coercion rather than rejecting an input that isn't a string. There should be a builtin to make a string from any value no matter what for logging and debugging, but that shouldn't be the standard concatenation operator.

2. This is more controversial, but strongly statically typed languages should not allow arbitrary values to be null. That sabotages one of the major strengths of strong static typing. Instead, there should be an option type to make it explicit. For something familiar to most programmers, SQL does this.

anotherbrownguy · on Aug 19, 2017

>I might have expected something like that from Javascript or PHP.

Javascript does it too, but PHP doesn't.

tim333 · on Aug 19, 2017

Trying it in the console just now I see javascript does it too

    >"hey "+null
    "hey null"

I wonder why the language designers thought that was a good idea?

kpil · on Aug 19, 2017

The reason is that the string concatenation operation is doing a String.valueOf() on the argument, and that delegates to toString() that promises a human readable string that is relevant.

For some reason someone decided to check for null instead of the stricter option of throwing a NullPointerException. Maybe to help debugging or to follow the gist of toString().

I guess it was too late to change even in Java 0.9... Autoboxing, iterators and other newer features are more strict,so I suppose it's just one of the few irregularities left from prehistoric times...

rootlocus · on Aug 19, 2017

> it's possible to get "null" as a string downstream

I don't see the problem with that. Strings that contain formatted variables should be used for display only. Besides, when someone inputs his name, the input is already a string, and I don't see how you would dereference string contents.

amptorn · on Aug 19, 2017

That's not the problem. Strings saying "null" are not the problem. That's just four lower-case characters. No system chokes on that directly.

The problem is downstream, where some galactic idiot has special-cased the string "null" to turn back into an actual `null`.

incongruity · on Aug 18, 2017

Sure – that's just casting a null type as a string.

So I guess if you have a comparison that somehow casts null to a string before comparing, you could run into this issue, but that's still bad programming.

I used to own null@myundergrad.edu as an email alias (with my real university, not that generic .edu, of course) and I got all sorts of interesting things... but that was very intentional on my part.

ajuc · on Aug 18, 2017

Funnily enough - it's not the null-> String conversion, it's the "+" operator. That was at one point subject of heated debate in my previous job :)

See:

     "null".equals((String)null)  //false
     "null".equals((String)null+"") //true
     System.out.print((String)null) // throws NPE
     System.out.println((String)null) // prints "null", I guess because it appends "\n" inside

incongruity · on Aug 18, 2017

Right, but that operator is implicitly casting the null object to a string type – it has to, in a strict sense... some other languages would raise an error (and many would do the same as Java).

ajuc · on Aug 18, 2017

It doesn't have to. null.toString() throws NPE as it should. I would expect "foo" + null to throw NPE as well.

BTW there's another "fun" gotcha, when you interface java code and oracle database which consider empty string and null to be the same thing. Depending on how you handle data from database you end up with null, "", or "null" :)

Asooka · on Aug 18, 2017

String.valueOf(null) returns "null" though, and that is what "foo" + null uses. I can understand why it would be surprising if you thought it used .toString, but it's clearly listed in the standard.

LukeShu · on Aug 19, 2017

Sure, but .toString() can be used to change how your class stringifies, so clearly it is being called at some point (perhaps by String.valueOf()).

And indeed,

    public static String valueOf(Object obj) {
        return (obj == null) ? "null" : obj.toString();
    }

http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/s...

incongruity · on Aug 18, 2017

Interesting – I haven't touched Java in years so I'm definitely out of my element with that language, so thank you.

amenghra · on Aug 18, 2017

    System.out.print((String)null)

Does not throw NPE.

ajuc · on Aug 18, 2017

it does here:

http://www.javarepl.com/term.html

But you're right, in regular java environment it doesn't now that I've checked.

monsieurbanana · on Aug 18, 2017

I don't think anyone ever argued that this wasn't because of bad programming.

jameshart · on Aug 18, 2017

Often it'll be compatibility with some upstream legacy system which exposes data as CSV or tab delimited, and outputs data like this:

   First,Middle,Last
   Alice,Q,Foo
   Bob,NULL,Bar

In a world like that, some code got written in the wrong tier to output

   value.toUpper() == "NULL" ? "" : value

and you wind up with a web form that can't roundtrip a name that contains the word 'null'.

Animats · on Aug 19, 2017

One place where that can easily happen is loading comma-separated value files into a database. Some CSV files are formatted with the convention that fields only appear in quotes if they contain special characters. Thus, a data value of NULL is not quoted. This loses the distinction made in SQL databases between NULL, the null value, and "NULL", the string.

(I just received some files like this. One is a file that has names. If someone had a name of NULL, it would go into the database as a null.)

tyingq · on Aug 18, 2017

Here's an example where it creeps in: https://issues.apache.org/jira/browse/FLEX-33644

jey · on Aug 18, 2017

Relevant excerpt:

    var nullXML:XML = <root>null</root>;
    if (nullXML == null) { trace("Some XML is null"); }
    
    nullXML == null is true!

That's... impressive.

tyingq · on Aug 18, 2017

Bug is still unclosed too.

DavidWoof · on Aug 18, 2017

OK, that's a good one.

Is this ActionScript that's doing the weird xml tag interpolation?

I stand corrected, this is definitely possible. Still, I bet this person's real problems stem from explicit coding.

zoul · on Aug 18, 2017

String interpolation will often end up crossing the barrier between NULL as an empty pointer value and "null" as an ordinary string.

benmmurphy · on Aug 18, 2017

i just checked one of the projects where i work and a search for "null" returns 32 matches. its a java project and i believe the checks came about because stuff being converted to strings and then being sent back and forth between html forms and the backend.

    > git grep -i '"NULL"' src |wc -l
    > 32

boondaburrah · on Aug 18, 2017

It's probably old DB software, the interactions between, and not the language itself.

crdoconnor · on Aug 19, 2017

YAML does this.

It's one of the reasons I wrote a pared down 'dumb' YAML parser that assumes scalar values are strings unless directed otherwise: https://github.com/crdoconnor/strictyaml

jugg1es · on Aug 18, 2017

Javascript lets you do it if you use the equality (==) instead of the identity (===).. i think.

faceplanted · on Aug 18, 2017

Javascript lets you do anything wrong if you use == and not === though, I'm pretty sure the machine apocalypse is going to happen because someone types == instead of === at this point.

TallGuyShort · on Aug 18, 2017

At least if the Terminators are running JavaScript they'll be pretty easy to thwart.

unkown-unknowns · on Aug 19, 2017

Blackhat conference talk we might see in the future: "How I achieved remote code execution on the T-1000 and singlehandedly averted the extinction of the human race"

nulagrithom · on Aug 18, 2017

Nope. Not even JavaScript screws this one up. You've got to be doing something stupid with strings to get tripped up by "Null".

swalsh · on Aug 19, 2017

Imagine a protocol that uses xml.... there's a few out. There are no quotes so null as a string will look exactly the same as A NULL value.

kpil · on Aug 19, 2017

But that's just the tip of the iceberg that xml is a amateurish protocol language. I'm so glad that it's finally dying.

Unfortunately it's replaced by something even more messed up

ams6110 · on Aug 19, 2017

Given that the entire protocol of the web (html and http) is string-based, that's not very unlikely.

profmonocle · on Aug 18, 2017

Just tested (in the node CLI console) and that doesn't seem to be the case.

Edit: It's true in PHP though. :-/

tyingq · on Aug 18, 2017

>Edit: It's true in PHP though. :-/

Can you post exactly what you did in PHP that shows "NULL" is equal to NULL? There's quite a few approaches like ==, ===, and is_null(). I can't get any to think "NULL" is NULL, though I imagine I'm missing something.

fenwick67 · on Aug 18, 2017

null == undefined in js.

MattBearman · on Aug 19, 2017

When using sequel pro with a MySQL database, typing NULL into a string field will make that field null rather than 'null', which always seemed like an odd design choice to me

brak1 · on Aug 19, 2017

Its a good compromise i think. OTherwise to update a field to null it would need a button or right clicking and selected a null option. I would expect the amount of people who actually want to enter a 'null' string is minimal...

prance · on Aug 19, 2017

This reasoning is probably behind some the OP's cases...

Semaphor · on Aug 19, 2017

Yeah, Null is no problem. But thanks to code my predecessor wrote, >null< would be.

__s · on Aug 20, 2017

SQL abstraction layer at work in PHP special cases the string NULL to search for 'is null' rather than search for the string. So it happens

jlebrech · on Aug 18, 2017

I can only see that happen if eval was ever used.

ams6110 · on Aug 19, 2017

Eval used to be used a lot, because web forms send everything as strings and you need to convert them to whatever datatypes on the back end. Eval probably permeates a lot of older legacy systems.

calibas · on Aug 19, 2017

I imagine there's many special cases where a compiler's type coercion creates an issue.

joejerryronnie · on Aug 18, 2017

Reminds me of the time we ordered our high school football jerseys. We filled out a form listing the requested size and spelling of our last name to be printed on the back of our jersey. A couple of weeks later, all the jerseys were delivered and we excitedly opened up the packing boxes to hand them out. Imaging the surprise and ensuing hilarity when our good friend, Marshall Blank's jersey came out of the box with no name printed on it whatsoever.

jmcdiesel · on Aug 18, 2017

My old Manager's last name is Blank. The sad thing, given blank isnt a reserved word at all, is he has the same problem. I think its EventBright or Ticketmaster, i forget, but one of those sites wont accept "Blank" as his last name, literally with the message of "Last name can not be blank" ...

tyingq · on Aug 18, 2017

I have seen apps that choke on much more common names. Like O'Brian.

This post is a classic on various name issues: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...

Freak_NL · on Aug 18, 2017

Tangentially related (rule 9); my girlfriend's surname contains an 'é'. I have yet to see a year go by without receiving mail having 'Ã©' on the address label where the é should be.

Explanation:

    echo "é" | iconv -t utf8 -f iso8859-15

We're Dutch, and the é is part of our language, and even part of the legacy character encoding standard everyone used before Unicode's widespread adoption. This is just a matter of code that works perfect as long as all characters are part of the ASCII set, but fails on the characters that don't conveniently match between UTF-8 and ISO-8859-15.

I doubt these issues will go away within even, say, twenty years.

Animats · on Aug 18, 2017

It's getting much better. Almost everything new is in UTF-8.

I've been writing code to clean up a 2013 database dump. The database stored everything in LATIN-1 fields. Not because the data is in LATIN-1, but because LATIN-1 will accept any byte value. This makes error messages during input go away. See this bad advice on Stack Overflow.[1]

Some of the data is ASCII. Some is UTF-8. Some is Windows-1252. Some data is none of those, but is mostly ASCII except that there's a 0x9d once in a while. (Still haven't figured out what character set that is. From context, the ™ or ® symbol is intended.) So I have recognizers for these cases, and convert everything to UTF-8, testing every field value individually.

One column has garbaged non-English names. Someone had tried to "normalize" UTF-8 to lower case by using an ASCII lowercasing function on UTF-8 stored in a LATIN-1 field:

    KACMAZLAR MEKANİK  -> kacmazlar mekanä°k
    Anita Calçados -> anita calã§ados
    Felfria Resor för att Koh Lanta -> felfria resor fã¶r att koh lanta

I have the un-"normalized" form and can fix this.

[1] https://stackoverflow.com/questions/44251813/unicodedecodeer...

ygra · on Aug 19, 2017

Even if something new is UTF-8, you'd basically have to guarantee that it never interfaces with anything old and/or broken to ensure the data survives intact. I've recently received a package where the ö in my name was written as Ì¦ which I've never seen before. Things go wrong in the most unexpected places and the sad thing is that even if you use UTF-8 for text storage, you still have to know what you can do and how to do it to not mangle it.

There are a lot of Unicode-hostile environments out there. Java is old enough to always require explicit encoding declaration for pretty much any tool ... compiler, documentation generator, etc. Forget it at any one point and you get garbage. Reading or writing text files should always make the encoding explicit, but rarely does so. C#'s string methods all support, but don't require, a Culture parameter, without which you're practically guaranteed to do things like case conversion, or substring searches wrong in the general case. There was an awesome and long answer by tchrist on SO once about what the Perl boilerplate is to properly support Unicode for many or most circumstances (it's complicated and long and I doubt many people are going those lengths).

Point being, even when using something that supports Unicode well, the programmer still has to care, simply because text and language are messy things and it simply isn't possible to have a magic bullet that does everything right.

kijin · on Aug 18, 2017

Dutch programmers probably have little incentive to get it right, because Dutch makes relatively light use of accents. I wouldn't be surprised if Czech programmers, for example, were much more meticulous at converting between UTF-8 and legacy encodings.

Here in CJK territory, using the wrong encoding makes the output so obviously broken [1] that mistakes are almost always caught before hitting production.

[1] https://en.wikipedia.org/wiki/Mojibake

sebtoast · on Aug 18, 2017

I'm French and I see this all the time, even Outlook, given the right conditions, will gives you this in the default folders, Inbox in French is "Boite de réception".

lokedhs · on Aug 19, 2017

Don't get me started on Outlook folders. The actual names are of the folders are localised, not just the way it's presented to the user. The folders are created when you first start Outlook (and not when your account is created).

If you happen to be using Windows configured in a foreign language the first time you start Outlook, your inbox, sent mail, etc, folders are named according to that language, and will never change, and you'll have to live with non-standard names for the folders.

At least its teaches you how to configure folders manually in most email clients.

jnky · on Aug 19, 2017

You can change the folder names after the fact. It is annoying, though.

https://support.microsoft.com/en-us/help/2826855

tajen · on Aug 19, 2017

That's why I take care that my OS doesn't know I'm French. But it's a luxury that most French people can't afford and they have to live with bugs. Ex: Having both a "Download" and a "Téléchargement" folder, with "Download" being sometimes translated.

lokedhs · on Aug 20, 2017

This is something Apple got right. The underlying folder has a standard name, and the translated name is just a different presentation. If you change language, the names of the translated folders change.

For anyone who doesn't know OSX, this translation happens on the UI level. Typing ls in a terminal gives you the real directory name.

xenadu02 · on Aug 19, 2017

> I doubt these issues will go away within even, say, twenty years.

Much like the printing press, I'm 100% certain that the computing (and the internet specifically) is altering human written language across the world.

It is just so much easier to avoid anything outside ASCII because you can be certain ASCII will always work - even though some awful MS Access -> CSV -> SQL -> SQL -> Excel -> SQL -> COBOL ETL pipeline. No matter what version of any software is being used.

Technology has always shaped written language and we should fight to do better but at this point it seems inevitable.

(To be clear: I'm not saying this is a good or desirable state of affairs)

kpil · on Aug 19, 2017

I really doubt that there is any transliteration into ASCI that's acceptable to European speakers.

Eg A Ą Å Æ Ä are all different letters in most languages, not simply a pronunciation guide. I think most European languages use at least two from that list.

tajen · on Aug 19, 2017

I wonder what happens if you book an airline ticket as "Miche" instead of "Miché". Will custom/CIA choke on the unmatching name?

nteou987noeu · on Aug 18, 2017

I feel your pain. My wife has a French first name with an acute accent over the "e" as well. Many forms will outright reject the accent, so I've taken to just not typing it most of the time. Being American, it doesn't really bother her, though. I know in some languages missing accents and other similar markings changes the pronunciation and meaning of words and can be very irritating to some.

prance · on Aug 19, 2017

> accents and other similar markings changes the pronunciation and meaning of words

Yes. In some languages those are actually not "markings" but denote proper letters, like in German ä,ö,ü and ß. But even if not, like in French, it can alter the meaning of words. E.g la != là. Therefore, for most Europeans and speakers of other languages that depend on more letters than ASCII provides, it is very annoying when that is not supported properly.

However, I have made the experience in a few cases that particularly Americans have a hard time understanding this. The remark about your wife not caring seems to be in this vein, too. Recently, I decided to convert our MySQL DB tables from latin1 to UTF8. (I wasn't even aware that we didn't have some form unicode, as our DB is only few years old, and I thought some unicode is the default nowadays everywhere. But then MySQL...)

Anyway, my CEO (also an American incidentally) was trying to keep me from it because he thought it's not high priority. However, we're about to go live in a French-speaking region, but which also has other indigenous languages (and therefore names), with their own "special" characters (I put "special" in quotes because for those languages, they're not "special" at all -- but I guess you get my gist by now).

Also, in previous jobs I have converted legacy systems to unicode and know what a pain it is down the road. Not to mention all the hard-to-find bugs if you don't do it, because some strings don't compare as they should, or people are just annoyed because their name is not shown correctly.

So I went ahead with the conversion anyway. We may never know for sure, but I'm convinced that I saved us some major customer frustrations, days of bug hunting and weeks of converting everything later, when existing data would need to be migrated.

So please everyone, just use UTF8 or some other unicode variant from the get-go. The few bits you might save otherwise are just not worth it.

charlieo88 · on Aug 18, 2017

It's not always a computer problem. I have an O' name. There have been many college educated adults I've had to explain the difference between a quote ("), an apostrophe ('), and whatever the hell you call the character under the tilde (`). I kid, it's the grave.

pjungwir · on Aug 18, 2017

If you really want some fun, try explaining to people the difference between modern Greek tonos (e.g. ή, Unicode 03ae) and ancient Greek oxia (e.g. ή, Unicode 1f75). Or in UTF-8:

    paul@tal:~$ od --format=x1z tmp/tonos-oxia
    0000000 74 6f 6e 6f 73 3a 20 ce ae 0a 6f 78 69 61 3a 20  >tonos: ...oxia: <
    0000020 20 e1 bd b5 0a                                   > ....<

It's super fun when you are the only tech person in a Classics grad program and everyone else is turning in papers that look like ransom notes, because every fourth vowel has a different x-height from all the other letters. :-)

Here are the two Unicode ranges (PDF):

modern Greek: http://unicode.org/charts/PDF/U0370.pdf

ancient Greek: http://unicode.org/charts/PDF/U1F00.pdf

Then again, try explaining to tech people the difference between ή and ἠ and how you can get ἤ or ᾔ. :-)

Grustaf · on Aug 19, 2017

You're not saying the classicists confuse the smooth breathing with the acute accent, right? Only they can't figure out how to typeset them?

How do you enter them nowadays by the way? In the early days of the internet before unicode there were special fonts from SIL for example that first of all were using Latin characters (you'd type W and it would look like Ω) and secondly I think had the diacritics as separate characters, so the font would combine them. It was messy but at least you could type it on a normal keyboard. You could even read it in its ASCII form, most of the hacks were pretty reasonable.

pjungwir · on Aug 22, 2017

Not smooth breathing and acute, but Unicode has two codepoints for acute accents (sort of). One is the modern Greek tonos, which often looks just like an acute accent but sometimes appears as a dot or straight vertical line, and the other codepoint is the ancient Greek oxia (the ancient term for the acute accent), which appears with all the other marks and their combinations (grave, circumflex, smooth/rough breathing, iota subscript).

Personally I type Greek using a vim keymap file, usually when writing LaTeX. I believe my keymap file is influenced by those older non-Unicode fonts, because I type w for ω and ;h for ή and >~h| for ᾖ. But my fellow students would mostly use Word. I don't know how they would type the letters, but commonly they would mix the tonos letters with everything else. I think what was happening is that Word would automatically substitute fonts that offered those codepoints, so it would end up showing Times for the letters with tonos and Palatino for the others (or something like that). Hence the ransom note effect.

memco · on Aug 19, 2017

My kingdom for a good monospaced font and supporting editor to handle these and RTL languages. I've found Setups that work with one or the other, but none seems to do it all just yet.

eadmund · on Aug 19, 2017

The combination of Adobe Source Code Pro & emacs works well for me.

halfdgvfyjjuc · on Aug 18, 2017

That character is a backtick not a grave isn't it? Also some cultures use «» as quotes. It's a big world.

dsp1234 · on Aug 18, 2017

https://en.wikipedia.org/wiki/Grave_accent

"Programmers use the grave accent symbol as a separate character (i.e., not combined with any letter) for a number of tasks. In this role, it is known as a backquote or backtick."

mark-r · on Aug 18, 2017

ASCII never really supported the idea of a combining character, so it's not surprising that the original intent was subverted.

kgwgk · on Aug 19, 2017

ASCII supported combining characters. That’s what backspace was for!

https://news.ycombinator.com/item?id=11654682

mark-r · on Aug 19, 2017

That was very illuminating, thanks. Interesting that the idea was there originally but was later deprecated when they realized video terminals couldn't do it. Today it would be trivial to convert the sequence letter/backspace/accent to its UTF-8 equivalent.

ygra · on Aug 19, 2017

Heck, in some fonts ` even takes the form of ‛ or ‘. Which is probably where typing quotes like ``this'' comes from (which looks absolutely awful nowadays since it mixes an uncombined diacritic with a kludge of a punctuation character).

0xffff2 · on Aug 18, 2017

Is there a difference? I think those are both words for the same character.

pc86 · on Aug 18, 2017

It's a backtick when by itself, grave when it's an accent over another character.

bmm6o · on Aug 18, 2017

For an extra challenge, you can explain what an 'okina is.

kijin · on Aug 18, 2017

The backtick and apostrophe are often used to emulate ``curly quotes,'' which makes the confusion even worse.

jamescostian · on Aug 18, 2017

I think this style might come from TeX, based on this comment: https://english.stackexchange.com/questions/17695/any-refere...

Manpages aren't written in TeX though (apparently they use something called "roff"), but they also contain things like `read' instead of 'read'... perhaps manpage writers tended to like TeX too?

gpvos · on Aug 18, 2017

Probably the other way round: roff/troff/nroff is older than TeX. I guess that on old computer typesetters, ` and ' turned into nicely balanced quotes.

tankenmate · on Aug 19, 2017

roff was the financial back reason that Unix moved from PDP-7 / assembler to PDP-11 / C.

http://www.read.seas.harvard.edu/~kohler/class/aosref/ritchi...

teddyh · on Aug 19, 2017

In old fonts, ` often displayed as ‘ and ' displayed as ’. When they wrote ``foo'', that was meant to show up as ‘‘foo’’, which was a workaround for not having characters to properly spell “foo”.

l0b0 · on Aug 19, 2017

Even more annoying is when you realise that those quotes are a hack because keyboards don't have separate left and right single and double quotation marks. The rabbit hole goes deeper…

justinv · on Aug 18, 2017

Hyphens too.

Source: have a hyphenated last name. "No special characters in this field".

tyingq · on Aug 18, 2017

I'll bet it was an airline. The underlying field type for names in a PNR in the TPF os only allows A-Z and space.

nteou987noeu · on Aug 18, 2017

I get that with periods in a street name. Like when I write "123 Main St.", and the web site checks against the USPS's database, it will either reject it outright for having "special" characters, or will say, "We didn't find that, but we found this similar address - '123 Main St', which should we use?" It's so fucking dumb.

kalleboo · on Aug 19, 2017

I had a fun issue signing up for a bank here in Japan.

Japanese people can't have middle names (the citizen registry doesn't allow it), but foreigners can, so many systems will reject spaces in the name field. Meanwhile they're meticulous about making sure your name matches your ID exactly, leading to the situation I had.

The bank's web signup form disallowed spaces in your name, so I wrote my name FIRSTMIDDLE. Then when they processed my application they sent me an email "Your name doesn't match your ID card! Please approve the change to 'FIRST MIDDLE'."

tyingq · on Aug 18, 2017

I sympathize, but it is a hard problem to solve. On the seller side, many orders come in with addresses that are just wrong, or missing suite numbers, business names, etc.

Ups and FedEx charge shippers ~$15 for each instance where they have to address correct.

So, yeah, the period thing is dumb, but automated correction is hard. Even experts, like SmartyStreets get it wrong often.

rev_null · on Aug 18, 2017

There was a notable incident where Facebook rejected Caterina Fake.

craigds · on Aug 18, 2017

Dutch names (two or three word surnames with spaces) confuse all sorts of systems/people.

sobani · on Aug 21, 2017

Dutch first names confuse systems as well. Gmail doesn't understand the first name of my colleague is "Jan Willem" and not just "Jan", when showing the list of recipients.

b3lvedere · on Aug 18, 2017

Many years ago computer systems analyzed my last name, which are two seperate words, as me being married. I got quite some snail mail calling me Mrs. [2nd part of last name] for some weird reason.

timcederman · on Aug 18, 2017

Astonishingly, nest.com does not work for me at all if I use the hyphen in my last name.

copperx · on Aug 21, 2017

Last names can start with a lowercase too, like d'Auriol.

jlgaddis · on Aug 18, 2017

One of my LLCs is named "Null Ventures LLC". I sometimes get things via snail mail that are addressed in interesting ways. The most common is simply "<space>Ventures LLC". Unlike the author, I've never had issues (AFAIK, anyways) receiving e-mail to the domain.

rukenshia · on Aug 18, 2017

previous discussion: Hello, I’m Mr. Null. My Name Makes Me Invisible to Computers - https://news.ycombinator.com/item?id=12426315

amenghra · on Aug 18, 2017

Reminds me the "NO PLATE" license plate story: http://www.snopes.com/autos/law/noplate.asp

anomie31 · on Aug 19, 2017

Why didn't they just null the field?

ebro · on Aug 18, 2017

Reminds me of running into this bug [1] at work that caused a few minutes of head scratching. Couldn't make any changes to a user's profile as their surname contained eval.

[1] https://quickview.cloudapps.cisco.com/quickview/bug/CSCut083...

jmull · on Aug 18, 2017

I don't get what's happening at the low-level for this to be a problem.

It seems like you'd have to do something pretty stupid at the coding level to introduce a problem with "Null" by mistake. I'm sure it happens, but not more of an occasional issue.

My best guess is that there are common old databases that did not have a first class null type where it was common practice to use the string "NULL" for that purpose. And that companies that have these old systems are proactively filtering user input to prevent causing these old system to choke... It sounds like the filter is case-insensitive, though, which would be too aggressive for the case I'm thinking of. Maybe they are (mis)using a bad word filter for this, which would tend to be aggressive.

eighthnate · on Aug 19, 2017

> My best guess is that there are common old databases that did not have a first class null type where it was common practice to use the string "NULL" for that purpose.

It isn't a DB issue. It's more of a front-end or middle tier issue.

In SQL standards - NULL is a "marker"/TYPE. NULL and "NULL" are two separate things. One is a null type and the other is a string type.

Or more specifically, it is a "interface" problem between RDBMs and front-end since languages handle null differently. Many languages didn't have null types and null in certain languages mean different things that "lack of information".

For example, if a database column was a nullable int column and you wanted to bring it out to the java or .net space you would have issues since "int" in java and .net are value types and not reference types. So you could assigned null to the values. Where as a string/text/varchar column you could since string in java and .net are reference types and can be null.

In some languages, checking for null means you have to convert null into a string and then compare "null" == "null".

It's a legacy of lack of Nullable types in many programming languages. With the introduction of Nullable types many of these problems went away.

glic3rinu · on Aug 18, 2017

I believe the problem might be more common when exporting data from one system to another. I've written code to migrate (and merge) large datasets from old legacy systems. I needed quite a lot of heuristics and "best-effort" transformations in order to deal with data inconsistencies... mostly string manipulations where it's not hard to imagine bugs like this happening.

jmull · on Aug 19, 2017

Ah, so it's your fault. Got it!

;-)

matthewbauer · on Aug 18, 2017

I think JavaScript is the biggest offender here. So this check would fail:

  if (lastName.toLowerCase() != null)

fenwick67 · on Aug 18, 2017

'null' != null in javascript, I'm pretty sure the only thing it weak equals is undefined.

jpindar · on Aug 18, 2017

Well, you all wanted weak and dynamic typing...

cdevs · on Aug 19, 2017

I'm trying to think how this would be a problem and it wouldn't directly be a issue created by company but could be a issue in companies we deal with passing around large databases of different types on people in America. It's common we have to deal with a pipe delimited set of large data and after loading it I'm sure my boss could see last name null in MySQL and go "let's delete all "null" names.

petraeus · on Aug 18, 2017

TLDR; It doesn't make you invisible only there are some badly coded apps out there that typecast null to strings, wired article cliackbait

mywittyname · on Aug 18, 2017

Does someone maintain a comprehensive list of names you should validate against?

kalleboo · on Aug 19, 2017

Not only names per-se, but for testing string handling in general I remember this was posted to HN a while back https://github.com/minimaxir/big-list-of-naughty-strings

stephenr · on Aug 18, 2017

what languages don't distinguish between literal null and a string with content "null".

More importantly, who the fuck used such monstrosities?

tyingq · on Aug 18, 2017

I don't think it's a specific language. It's typically some protocol or library or developer inventing their own "nullable value" placeholder.

Like this: https://issues.apache.org/jira/browse/FLEX-33644

boondaburrah · on Aug 18, 2017

Old software that makes bad SQL queries with string concatenation.

intopieces · on Aug 18, 2017

Oh, let me tell you about the saga of Mr Curl trying to update his billing information at a company I used to work for...

stephenr · on Aug 18, 2017

I don't understand. Are you agreeing with me or suggesting a string with the contents "curl" will somehow cause the server to execute the curl command?

intopieces · on Aug 21, 2017

This happened. I was working tech support for a company with a very old billing database. Customer was attempting to update his billing information with a new credit card and was immediately getting an error - Forbidden.

Turned out it was his last name, Curl, that was causing the issue. The system was throwing an exception because it interpreted the customer's entry as attempting to execute the curl command.

We ended up having him put "JR" at the end of his last name to prevent it.

jessaustin · on Aug 18, 2017

Maybe this was back in the days of CGI? It's perfectly feasible that the right combination of characters might break out of an ill-advised pipe to get to the shell.

ams6110 · on Aug 19, 2017

Yep, it might if it was ever passed to eval()

tresp · on Aug 18, 2017

obligatory xkcd https://www.xkcd.com/327/

AznHisoka · on Aug 18, 2017

This has been overshared too many times.

amenghra · on Aug 18, 2017

Obligatory: https://xkcd.com/1053/

mark-r · on Aug 18, 2017

I'm one of the 10000 people seeing this for the first time today, thanks!

SubiculumCode · on Aug 19, 2017

gervase · on Aug 18, 2017

Title could use a (2015) tag.

backordr · on Aug 18, 2017

brb...changing my surname to "null".

isostatic · on Aug 19, 2017

Good luck with that. If computers can't take it, how will you register it in the first place?