Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My last name makes me invisible to Computers (2015) (wired.com)
166 points by traviswingo on Aug 18, 2017 | hide | past | favorite | 135 comments


I've been programming for over a quarter century, and I feel pretty confident that I've never written anything that could confuse 'null' with NULL. I can't even think of a language that would let you easily do this.

If web forms aren't accepting NULL, then somebody probably specifically programmed the word 'null' into a filter of disallowed entries. Probably to stop clerks from entering the word 'null' to mean empty string. This has nothing to do with null being a reserved word in many languages, I'll bet the forms that aren't accepting 'null' aren't accepting 'none' or 'empty' either.


In java + operator on strings works like this:

    Object a = null;
    String s = "foo bar " + a; // s == "foo bar null"
So, it's possible to get "null" as a string downstream, when some variable that should be non-nulleable - was null. If you find such a bug and incompetently fix it by checking for "null" downstream instead of checking before turning variables into strings - you have the error from the article. Especially if you check ignoring the case.

BTW guess how I know it's working like that :)


That's insane.

I've known Java to be a bit verbose, but I thought it was mostly a reasonable language aside from that. I might have expected something like that from Javascript or PHP. Never in my wildest dreams would I have thought that boring old Java would interpret a null cast to a string as a literal string "null" when combining strings. Even Ruby and Python don't do that - they throw type mismatch errors instead. C# treats it as empty string.


Ehh, I don't really think this is a huge deal personally. Keep in mind that it is not that `(string)null` gives you the "null" string, or `((string)null).ToString()` gives you the null string. You have to use actual string concatenation to get it (Or as others have pointed out, `String.ToValue(null)`). Point being, it isn't/shouldn't be messing up your regular comparisons (Unless you turn it into a string beforehand), and you really shouldn't be checking for stuff like `null` against a string literal anyway - it's meant for displaying a human readable string, not an unambiguous string that could be converted back at a later time. If you wanted that then you should use proper serialization techniques rather then just concatenating a bunch of values.


It's not that it's really that huge of a deal, just that it's so out of character for Java and everything it is seen as.

It's like if a guy you were friends with since college and always knew to be solid and reliable but a little boring, and now many years later, he's happily married with kids, watching sports, working in a large hierarchical organization, doing pretty average stuff but nothing unusual or exciting. If you suddenly found out that guy had secretly been a Furry the whole time, that would be really shocking.

Not that there's anything wrong with being boring and reliable, or being a furry. But the sudden change in how you saw somebody or something is stunning.

I half expect anything written in Java to be littered with AbstractFactoryFactories and giant frameworks using 20 different design patterns to write Hello World. I never would have expected Java to silently convert an actual null to the string "null". I always thought the one thing you can count on Java for was to be strictly strongly typed, and never silently do any weird random conversions that nobody would have expected. Guess I was wrong.

Though poking around in a few other languages, JavaScript does indeed do that also, though I kinda expect JavaScript to do things like that. In Ruby, nil.to_s gives empty string, and adding a nil to a string gives you a type error. In Python, trying to add None to a string also gives you a type error, but str(None) does give you 'None'. That's a bit disappointing, but not shocking to me.


It's insane in that Java pretends to have strong, static typing, then does things like that.


Overloading the plus operator for strings was probably a mistake in the language but that's hardly a condemnation of their type system.


A lot of languages overload + to mean concatenate. I think that's a mistake, however...

In Python:

    "abc" + None
    TypeError: cannot concatenate 'str' and 'NoneType' objects
    
    "abc".join(None)
    TypeError: can only join an iterable
In Ruby

    "abc" + nil
    TypeError: no implicit conversion of nil into String
In Lua

    = "abc" .. Nil
    attempt to concatenate global 'Nil' (a nil value)
But not Java. Java implicitly tries to coerce the provided value to be a string, which seems out of place in a language that values type safety. That all types might also be null also seems out of place in a language with strong, static typing, but that's been discussed to death. The problem is compounded by the fact that null is actually converted to "null" instead of an empty string.


On the other hand, it's easier to get into this situation in Python than Java.

    bar = "bar" 
    foo = "foo"
    foo_bar = foo + baar   # intentional typo (baar will be None)
In Java this would be a compiler error, but python accepts it until the program is being run.


That's a NameError in Python: undefined variables don't have a value, not even None. It won't be caught until runtime since there's no AOT compiler in Python, but it's the same kind of error it is in Java.

My claim isn't that Python is safer overall than Java. Instead, it's that Java, a language that is mostly type safe, most of the time should not have these two potentially surprising behaviors:

1. The standard string concatenation operator does implicit coercion rather than rejecting an input that isn't a string. There should be a builtin to make a string from any value no matter what for logging and debugging, but that shouldn't be the standard concatenation operator.

2. This is more controversial, but strongly statically typed languages should not allow arbitrary values to be null. That sabotages one of the major strengths of strong static typing. Instead, there should be an option type to make it explicit. For something familiar to most programmers, SQL does this.


>I might have expected something like that from Javascript or PHP.

Javascript does it too, but PHP doesn't.


Trying it in the console just now I see javascript does it too

    >"hey "+null
    "hey null"
I wonder why the language designers thought that was a good idea?


The reason is that the string concatenation operation is doing a String.valueOf() on the argument, and that delegates to toString() that promises a human readable string that is relevant.

For some reason someone decided to check for null instead of the stricter option of throwing a NullPointerException. Maybe to help debugging or to follow the gist of toString().

I guess it was too late to change even in Java 0.9... Autoboxing, iterators and other newer features are more strict,so I suppose it's just one of the few irregularities left from prehistoric times...


> it's possible to get "null" as a string downstream

I don't see the problem with that. Strings that contain formatted variables should be used for display only. Besides, when someone inputs his name, the input is already a string, and I don't see how you would dereference string contents.


That's not the problem. Strings saying "null" are not the problem. That's just four lower-case characters. No system chokes on that directly.

The problem is downstream, where some galactic idiot has special-cased the string "null" to turn back into an actual `null`.


Sure – that's just casting a null type as a string.

So I guess if you have a comparison that somehow casts null to a string before comparing, you could run into this issue, but that's still bad programming.

I used to own null@myundergrad.edu as an email alias (with my real university, not that generic .edu, of course) and I got all sorts of interesting things... but that was very intentional on my part.


Funnily enough - it's not the null-> String conversion, it's the "+" operator. That was at one point subject of heated debate in my previous job :)

See:

     "null".equals((String)null)  //false
     "null".equals((String)null+"") //true
     System.out.print((String)null) // throws NPE
     System.out.println((String)null) // prints "null", I guess because it appends "\n" inside


Right, but that operator is implicitly casting the null object to a string type – it has to, in a strict sense... some other languages would raise an error (and many would do the same as Java).


It doesn't have to. null.toString() throws NPE as it should. I would expect "foo" + null to throw NPE as well.

BTW there's another "fun" gotcha, when you interface java code and oracle database which consider empty string and null to be the same thing. Depending on how you handle data from database you end up with null, "", or "null" :)


String.valueOf(null) returns "null" though, and that is what "foo" + null uses. I can understand why it would be surprising if you thought it used .toString, but it's clearly listed in the standard.


Sure, but .toString() can be used to change how your class stringifies, so clearly it is being called at some point (perhaps by String.valueOf()).

And indeed,

    public static String valueOf(Object obj) {
        return (obj == null) ? "null" : obj.toString();
    }
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/s...


Interesting – I haven't touched Java in years so I'm definitely out of my element with that language, so thank you.


    System.out.print((String)null)
Does not throw NPE.


it does here:

http://www.javarepl.com/term.html

But you're right, in regular java environment it doesn't now that I've checked.


I don't think anyone ever argued that this wasn't because of bad programming.


Often it'll be compatibility with some upstream legacy system which exposes data as CSV or tab delimited, and outputs data like this:

   First,Middle,Last
   Alice,Q,Foo
   Bob,NULL,Bar
In a world like that, some code got written in the wrong tier to output

   value.toUpper() == "NULL" ? "" : value
and you wind up with a web form that can't roundtrip a name that contains the word 'null'.


One place where that can easily happen is loading comma-separated value files into a database. Some CSV files are formatted with the convention that fields only appear in quotes if they contain special characters. Thus, a data value of NULL is not quoted. This loses the distinction made in SQL databases between NULL, the null value, and "NULL", the string.

(I just received some files like this. One is a file that has names. If someone had a name of NULL, it would go into the database as a null.)


Here's an example where it creeps in: https://issues.apache.org/jira/browse/FLEX-33644


Relevant excerpt:

    var nullXML:XML = <root>null</root>;
    if (nullXML == null) { trace("Some XML is null"); }
    
    nullXML == null is true!
That's... impressive.


Bug is still unclosed too.


OK, that's a good one.

Is this ActionScript that's doing the weird xml tag interpolation?

I stand corrected, this is definitely possible. Still, I bet this person's real problems stem from explicit coding.


String interpolation will often end up crossing the barrier between NULL as an empty pointer value and "null" as an ordinary string.


i just checked one of the projects where i work and a search for "null" returns 32 matches. its a java project and i believe the checks came about because stuff being converted to strings and then being sent back and forth between html forms and the backend.

    > git grep -i '"NULL"' src |wc -l
    > 32


It's probably old DB software, the interactions between, and not the language itself.


YAML does this.

It's one of the reasons I wrote a pared down 'dumb' YAML parser that assumes scalar values are strings unless directed otherwise: https://github.com/crdoconnor/strictyaml


Javascript lets you do it if you use the equality (==) instead of the identity (===).. i think.


Javascript lets you do anything wrong if you use == and not === though, I'm pretty sure the machine apocalypse is going to happen because someone types == instead of === at this point.


At least if the Terminators are running JavaScript they'll be pretty easy to thwart.


Blackhat conference talk we might see in the future: "How I achieved remote code execution on the T-1000 and singlehandedly averted the extinction of the human race"


Nope. Not even JavaScript screws this one up. You've got to be doing something stupid with strings to get tripped up by "Null".


Imagine a protocol that uses xml.... there's a few out. There are no quotes so null as a string will look exactly the same as A NULL value.


But that's just the tip of the iceberg that xml is a amateurish protocol language. I'm so glad that it's finally dying.

Unfortunately it's replaced by something even more messed up


Given that the entire protocol of the web (html and http) is string-based, that's not very unlikely.


Just tested (in the node CLI console) and that doesn't seem to be the case.

Edit: It's true in PHP though. :-/


>Edit: It's true in PHP though. :-/

Can you post exactly what you did in PHP that shows "NULL" is equal to NULL? There's quite a few approaches like ==, ===, and is_null(). I can't get any to think "NULL" is NULL, though I imagine I'm missing something.


null == undefined in js.


When using sequel pro with a MySQL database, typing NULL into a string field will make that field null rather than 'null', which always seemed like an odd design choice to me


Its a good compromise i think. OTherwise to update a field to null it would need a button or right clicking and selected a null option. I would expect the amount of people who actually want to enter a 'null' string is minimal...


This reasoning is probably behind some the OP's cases...


Yeah, Null is no problem. But thanks to code my predecessor wrote, >null< would be.


SQL abstraction layer at work in PHP special cases the string NULL to search for 'is null' rather than search for the string. So it happens


I can only see that happen if eval was ever used.


Eval used to be used a lot, because web forms send everything as strings and you need to convert them to whatever datatypes on the back end. Eval probably permeates a lot of older legacy systems.


I imagine there's many special cases where a compiler's type coercion creates an issue.


Reminds me of the time we ordered our high school football jerseys. We filled out a form listing the requested size and spelling of our last name to be printed on the back of our jersey. A couple of weeks later, all the jerseys were delivered and we excitedly opened up the packing boxes to hand them out. Imaging the surprise and ensuing hilarity when our good friend, Marshall Blank's jersey came out of the box with no name printed on it whatsoever.


My old Manager's last name is Blank. The sad thing, given blank isnt a reserved word at all, is he has the same problem. I think its EventBright or Ticketmaster, i forget, but one of those sites wont accept "Blank" as his last name, literally with the message of "Last name can not be blank" ...


I have seen apps that choke on much more common names. Like O'Brian.

This post is a classic on various name issues: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...


Tangentially related (rule 9); my girlfriend's surname contains an 'é'. I have yet to see a year go by without receiving mail having 'é' on the address label where the é should be.

Explanation:

    echo "é" | iconv -t utf8 -f iso8859-15
We're Dutch, and the é is part of our language, and even part of the legacy character encoding standard everyone used before Unicode's widespread adoption. This is just a matter of code that works perfect as long as all characters are part of the ASCII set, but fails on the characters that don't conveniently match between UTF-8 and ISO-8859-15.

I doubt these issues will go away within even, say, twenty years.


It's getting much better. Almost everything new is in UTF-8.

I've been writing code to clean up a 2013 database dump. The database stored everything in LATIN-1 fields. Not because the data is in LATIN-1, but because LATIN-1 will accept any byte value. This makes error messages during input go away. See this bad advice on Stack Overflow.[1]

Some of the data is ASCII. Some is UTF-8. Some is Windows-1252. Some data is none of those, but is mostly ASCII except that there's a 0x9d once in a while. (Still haven't figured out what character set that is. From context, the ™ or ® symbol is intended.) So I have recognizers for these cases, and convert everything to UTF-8, testing every field value individually.

One column has garbaged non-English names. Someone had tried to "normalize" UTF-8 to lower case by using an ASCII lowercasing function on UTF-8 stored in a LATIN-1 field:

    KACMAZLAR MEKANİK  -> kacmazlar mekanä°k
    Anita Calçados -> anita calã§ados
    Felfria Resor för att Koh Lanta -> felfria resor fã¶r att koh lanta
I have the un-"normalized" form and can fix this.

[1] https://stackoverflow.com/questions/44251813/unicodedecodeer...


Even if something new is UTF-8, you'd basically have to guarantee that it never interfaces with anything old and/or broken to ensure the data survives intact. I've recently received a package where the ö in my name was written as ̦ which I've never seen before. Things go wrong in the most unexpected places and the sad thing is that even if you use UTF-8 for text storage, you still have to know what you can do and how to do it to not mangle it.

There are a lot of Unicode-hostile environments out there. Java is old enough to always require explicit encoding declaration for pretty much any tool ... compiler, documentation generator, etc. Forget it at any one point and you get garbage. Reading or writing text files should always make the encoding explicit, but rarely does so. C#'s string methods all support, but don't require, a Culture parameter, without which you're practically guaranteed to do things like case conversion, or substring searches wrong in the general case. There was an awesome and long answer by tchrist on SO once about what the Perl boilerplate is to properly support Unicode for many or most circumstances (it's complicated and long and I doubt many people are going those lengths).

Point being, even when using something that supports Unicode well, the programmer still has to care, simply because text and language are messy things and it simply isn't possible to have a magic bullet that does everything right.


Dutch programmers probably have little incentive to get it right, because Dutch makes relatively light use of accents. I wouldn't be surprised if Czech programmers, for example, were much more meticulous at converting between UTF-8 and legacy encodings.

Here in CJK territory, using the wrong encoding makes the output so obviously broken [1] that mistakes are almost always caught before hitting production.

[1] https://en.wikipedia.org/wiki/Mojibake


I'm French and I see this all the time, even Outlook, given the right conditions, will gives you this in the default folders, Inbox in French is "Boite de réception".


Don't get me started on Outlook folders. The actual names are of the folders are localised, not just the way it's presented to the user. The folders are created when you first start Outlook (and not when your account is created).

If you happen to be using Windows configured in a foreign language the first time you start Outlook, your inbox, sent mail, etc, folders are named according to that language, and will never change, and you'll have to live with non-standard names for the folders.

At least its teaches you how to configure folders manually in most email clients.


You can change the folder names after the fact. It is annoying, though.

https://support.microsoft.com/en-us/help/2826855


That's why I take care that my OS doesn't know I'm French. But it's a luxury that most French people can't afford and they have to live with bugs. Ex: Having both a "Download" and a "Téléchargement" folder, with "Download" being sometimes translated.


This is something Apple got right. The underlying folder has a standard name, and the translated name is just a different presentation. If you change language, the names of the translated folders change.

For anyone who doesn't know OSX, this translation happens on the UI level. Typing ls in a terminal gives you the real directory name.


> I doubt these issues will go away within even, say, twenty years.

Much like the printing press, I'm 100% certain that the computing (and the internet specifically) is altering human written language across the world.

It is just so much easier to avoid anything outside ASCII because you can be certain ASCII will always work - even though some awful MS Access -> CSV -> SQL -> SQL -> Excel -> SQL -> COBOL ETL pipeline. No matter what version of any software is being used.

Technology has always shaped written language and we should fight to do better but at this point it seems inevitable.

(To be clear: I'm not saying this is a good or desirable state of affairs)


I really doubt that there is any transliteration into ASCI that's acceptable to European speakers.

Eg A Ą Å Æ Ä are all different letters in most languages, not simply a pronunciation guide. I think most European languages use at least two from that list.


I wonder what happens if you book an airline ticket as "Miche" instead of "Miché". Will custom/CIA choke on the unmatching name?


I feel your pain. My wife has a French first name with an acute accent over the "e" as well. Many forms will outright reject the accent, so I've taken to just not typing it most of the time. Being American, it doesn't really bother her, though. I know in some languages missing accents and other similar markings changes the pronunciation and meaning of words and can be very irritating to some.


> accents and other similar markings changes the pronunciation and meaning of words

Yes. In some languages those are actually not "markings" but denote proper letters, like in German ä,ö,ü and ß. But even if not, like in French, it can alter the meaning of words. E.g la != là. Therefore, for most Europeans and speakers of other languages that depend on more letters than ASCII provides, it is very annoying when that is not supported properly.

However, I have made the experience in a few cases that particularly Americans have a hard time understanding this. The remark about your wife not caring seems to be in this vein, too. Recently, I decided to convert our MySQL DB tables from latin1 to UTF8. (I wasn't even aware that we didn't have some form unicode, as our DB is only few years old, and I thought some unicode is the default nowadays everywhere. But then MySQL...)

Anyway, my CEO (also an American incidentally) was trying to keep me from it because he thought it's not high priority. However, we're about to go live in a French-speaking region, but which also has other indigenous languages (and therefore names), with their own "special" characters (I put "special" in quotes because for those languages, they're not "special" at all -- but I guess you get my gist by now).

Also, in previous jobs I have converted legacy systems to unicode and know what a pain it is down the road. Not to mention all the hard-to-find bugs if you don't do it, because some strings don't compare as they should, or people are just annoyed because their name is not shown correctly.

So I went ahead with the conversion anyway. We may never know for sure, but I'm convinced that I saved us some major customer frustrations, days of bug hunting and weeks of converting everything later, when existing data would need to be migrated.

So please everyone, just use UTF8 or some other unicode variant from the get-go. The few bits you might save otherwise are just not worth it.


It's not always a computer problem. I have an O' name. There have been many college educated adults I've had to explain the difference between a quote ("), an apostrophe ('), and whatever the hell you call the character under the tilde (`). I kid, it's the grave.


If you really want some fun, try explaining to people the difference between modern Greek tonos (e.g. ή, Unicode 03ae) and ancient Greek oxia (e.g. ή, Unicode 1f75). Or in UTF-8:

    paul@tal:~$ od --format=x1z tmp/tonos-oxia
    0000000 74 6f 6e 6f 73 3a 20 ce ae 0a 6f 78 69 61 3a 20  >tonos: ...oxia: <
    0000020 20 e1 bd b5 0a                                   > ....<
It's super fun when you are the only tech person in a Classics grad program and everyone else is turning in papers that look like ransom notes, because every fourth vowel has a different x-height from all the other letters. :-)

Here are the two Unicode ranges (PDF):

modern Greek: http://unicode.org/charts/PDF/U0370.pdf

ancient Greek: http://unicode.org/charts/PDF/U1F00.pdf

Then again, try explaining to tech people the difference between ή and ἠ and how you can get ἤ or ᾔ. :-)


You're not saying the classicists confuse the smooth breathing with the acute accent, right? Only they can't figure out how to typeset them?

How do you enter them nowadays by the way? In the early days of the internet before unicode there were special fonts from SIL for example that first of all were using Latin characters (you'd type W and it would look like Ω) and secondly I think had the diacritics as separate characters, so the font would combine them. It was messy but at least you could type it on a normal keyboard. You could even read it in its ASCII form, most of the hacks were pretty reasonable.


Not smooth breathing and acute, but Unicode has two codepoints for acute accents (sort of). One is the modern Greek tonos, which often looks just like an acute accent but sometimes appears as a dot or straight vertical line, and the other codepoint is the ancient Greek oxia (the ancient term for the acute accent), which appears with all the other marks and their combinations (grave, circumflex, smooth/rough breathing, iota subscript).

Personally I type Greek using a vim keymap file, usually when writing LaTeX. I believe my keymap file is influenced by those older non-Unicode fonts, because I type w for ω and ;h for ή and >~h| for ᾖ. But my fellow students would mostly use Word. I don't know how they would type the letters, but commonly they would mix the tonos letters with everything else. I think what was happening is that Word would automatically substitute fonts that offered those codepoints, so it would end up showing Times for the letters with tonos and Palatino for the others (or something like that). Hence the ransom note effect.


My kingdom for a good monospaced font and supporting editor to handle these and RTL languages. I've found Setups that work with one or the other, but none seems to do it all just yet.


The combination of Adobe Source Code Pro & emacs works well for me.


That character is a backtick not a grave isn't it? Also some cultures use «» as quotes. It's a big world.


https://en.wikipedia.org/wiki/Grave_accent

"Programmers use the grave accent symbol as a separate character (i.e., not combined with any letter) for a number of tasks. In this role, it is known as a backquote or backtick."


ASCII never really supported the idea of a combining character, so it's not surprising that the original intent was subverted.


ASCII supported combining characters. That’s what backspace was for!

https://news.ycombinator.com/item?id=11654682


That was very illuminating, thanks. Interesting that the idea was there originally but was later deprecated when they realized video terminals couldn't do it. Today it would be trivial to convert the sequence letter/backspace/accent to its UTF-8 equivalent.


Heck, in some fonts ` even takes the form of ‛ or ‘. Which is probably where typing quotes like ``this'' comes from (which looks absolutely awful nowadays since it mixes an uncombined diacritic with a kludge of a punctuation character).


Is there a difference? I think those are both words for the same character.


It's a backtick when by itself, grave when it's an accent over another character.


For an extra challenge, you can explain what an 'okina is.


The backtick and apostrophe are often used to emulate ``curly quotes,'' which makes the confusion even worse.


I think this style might come from TeX, based on this comment: https://english.stackexchange.com/questions/17695/any-refere...

Manpages aren't written in TeX though (apparently they use something called "roff"), but they also contain things like `read' instead of 'read'... perhaps manpage writers tended to like TeX too?


Probably the other way round: roff/troff/nroff is older than TeX. I guess that on old computer typesetters, ` and ' turned into nicely balanced quotes.


roff was the financial back reason that Unix moved from PDP-7 / assembler to PDP-11 / C.

http://www.read.seas.harvard.edu/~kohler/class/aosref/ritchi...


In old fonts, ` often displayed as ‘ and ' displayed as ’. When they wrote ``foo'', that was meant to show up as ‘‘foo’’, which was a workaround for not having characters to properly spell “foo”.


Even more annoying is when you realise that those quotes are a hack because keyboards don't have separate left and right single and double quotation marks. The rabbit hole goes deeper…


Hyphens too.

Source: have a hyphenated last name. "No special characters in this field".


I'll bet it was an airline. The underlying field type for names in a PNR in the TPF os only allows A-Z and space.


I get that with periods in a street name. Like when I write "123 Main St.", and the web site checks against the USPS's database, it will either reject it outright for having "special" characters, or will say, "We didn't find that, but we found this similar address - '123 Main St', which should we use?" It's so fucking dumb.


I had a fun issue signing up for a bank here in Japan.

Japanese people can't have middle names (the citizen registry doesn't allow it), but foreigners can, so many systems will reject spaces in the name field. Meanwhile they're meticulous about making sure your name matches your ID exactly, leading to the situation I had.

The bank's web signup form disallowed spaces in your name, so I wrote my name FIRSTMIDDLE. Then when they processed my application they sent me an email "Your name doesn't match your ID card! Please approve the change to 'FIRST MIDDLE'."


I sympathize, but it is a hard problem to solve. On the seller side, many orders come in with addresses that are just wrong, or missing suite numbers, business names, etc.

Ups and FedEx charge shippers ~$15 for each instance where they have to address correct.

So, yeah, the period thing is dumb, but automated correction is hard. Even experts, like SmartyStreets get it wrong often.


There was a notable incident where Facebook rejected Caterina Fake.


Dutch names (two or three word surnames with spaces) confuse all sorts of systems/people.


Dutch first names confuse systems as well. Gmail doesn't understand the first name of my colleague is "Jan Willem" and not just "Jan", when showing the list of recipients.


Many years ago computer systems analyzed my last name, which are two seperate words, as me being married. I got quite some snail mail calling me Mrs. [2nd part of last name] for some weird reason.


Astonishingly, nest.com does not work for me at all if I use the hyphen in my last name.


Last names can start with a lowercase too, like d'Auriol.


One of my LLCs is named "Null Ventures LLC". I sometimes get things via snail mail that are addressed in interesting ways. The most common is simply "<space>Ventures LLC". Unlike the author, I've never had issues (AFAIK, anyways) receiving e-mail to the domain.


previous discussion: Hello, I’m Mr. Null. My Name Makes Me Invisible to Computers - https://news.ycombinator.com/item?id=12426315


Reminds me the "NO PLATE" license plate story: http://www.snopes.com/autos/law/noplate.asp


Why didn't they just null the field?


Reminds me of running into this bug [1] at work that caused a few minutes of head scratching. Couldn't make any changes to a user's profile as their surname contained eval.

[1] https://quickview.cloudapps.cisco.com/quickview/bug/CSCut083...


I don't get what's happening at the low-level for this to be a problem.

It seems like you'd have to do something pretty stupid at the coding level to introduce a problem with "Null" by mistake. I'm sure it happens, but not more of an occasional issue.

My best guess is that there are common old databases that did not have a first class null type where it was common practice to use the string "NULL" for that purpose. And that companies that have these old systems are proactively filtering user input to prevent causing these old system to choke... It sounds like the filter is case-insensitive, though, which would be too aggressive for the case I'm thinking of. Maybe they are (mis)using a bad word filter for this, which would tend to be aggressive.


> My best guess is that there are common old databases that did not have a first class null type where it was common practice to use the string "NULL" for that purpose.

It isn't a DB issue. It's more of a front-end or middle tier issue.

In SQL standards - NULL is a "marker"/TYPE. NULL and "NULL" are two separate things. One is a null type and the other is a string type.

Or more specifically, it is a "interface" problem between RDBMs and front-end since languages handle null differently. Many languages didn't have null types and null in certain languages mean different things that "lack of information".

For example, if a database column was a nullable int column and you wanted to bring it out to the java or .net space you would have issues since "int" in java and .net are value types and not reference types. So you could assigned null to the values. Where as a string/text/varchar column you could since string in java and .net are reference types and can be null.

In some languages, checking for null means you have to convert null into a string and then compare "null" == "null".

It's a legacy of lack of Nullable types in many programming languages. With the introduction of Nullable types many of these problems went away.


I believe the problem might be more common when exporting data from one system to another. I've written code to migrate (and merge) large datasets from old legacy systems. I needed quite a lot of heuristics and "best-effort" transformations in order to deal with data inconsistencies... mostly string manipulations where it's not hard to imagine bugs like this happening.


Ah, so it's your fault. Got it!

;-)


I think JavaScript is the biggest offender here. So this check would fail:

  if (lastName.toLowerCase() != null)


'null' != null in javascript, I'm pretty sure the only thing it weak equals is undefined.


Well, you all wanted weak and dynamic typing...


I'm trying to think how this would be a problem and it wouldn't directly be a issue created by company but could be a issue in companies we deal with passing around large databases of different types on people in America. It's common we have to deal with a pipe delimited set of large data and after loading it I'm sure my boss could see last name null in MySQL and go "let's delete all "null" names.


TLDR; It doesn't make you invisible only there are some badly coded apps out there that typecast null to strings, wired article cliackbait


Does someone maintain a comprehensive list of names you should validate against?


Not only names per-se, but for testing string handling in general I remember this was posted to HN a while back https://github.com/minimaxir/big-list-of-naughty-strings


what languages don't distinguish between literal null and a string with content "null".

More importantly, who the fuck used such monstrosities?


I don't think it's a specific language. It's typically some protocol or library or developer inventing their own "nullable value" placeholder.

Like this: https://issues.apache.org/jira/browse/FLEX-33644


Old software that makes bad SQL queries with string concatenation.


Oh, let me tell you about the saga of Mr Curl trying to update his billing information at a company I used to work for...


I don't understand. Are you agreeing with me or suggesting a string with the contents "curl" will somehow cause the server to execute the curl command?


This happened. I was working tech support for a company with a very old billing database. Customer was attempting to update his billing information with a new credit card and was immediately getting an error - Forbidden.

Turned out it was his last name, Curl, that was causing the issue. The system was throwing an exception because it interpreted the customer's entry as attempting to execute the curl command.

We ended up having him put "JR" at the end of his last name to prevent it.


Maybe this was back in the days of CGI? It's perfectly feasible that the right combination of characters might break out of an ill-advised pipe to get to the shell.


Yep, it might if it was ever passed to eval()



This has been overshared too many times.



I'm one of the 10000 people seeing this for the first time today, thanks!


NULL


Title could use a (2015) tag.


brb...changing my surname to "null".


Good luck with that. If computers can't take it, how will you register it in the first place?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: