...except that there is nothing that *prevents* you from having a stricter contr...

lmm · on June 19, 2012

But not e.g. protobuf. So don't write a distributed routing framework that assumes hashCode() is stable - if you do, it will work fine for Strings and HashMaps and all the standard java objects, and then fail when someone tries to use it with protobuf objects.

cbsmith · on June 19, 2012

I ended up posting something much longer on this: http://news.ycombinator.com/item?id=4131326

Your comment qualifies as being "wrong", because of one part: "...it will work fine for Strings and HashMaps and all the standard java objects...". In fact hashCode() is stable for a very small subset of the types that ship with standard Java, and even in some of those cases there are caveats. Arrays come pretty standard with Java, but...

    boolean alwaysFalse(int[] a) { return a.hashCode() == a.clone().hashCode(); }

lmm · on June 19, 2012

a.clone().equals(a) is false too; you can't use Array as a thing to route on any more than you can use Object. The point is that in standard java objects with value semantics have stable hashCode implementations, and this is not true in general

(And FWIW arrays are a weird corner of Java as far as I'm concerned; every codebase I've worked on avoids them as much as possible)

cbsmith · on June 19, 2012

> a.clone().equals(a) is false too.

a.clone().equals(a) HAS to be false in that case. It's more than a bit of a given if a.hashCode() != a.clone().hashCode().

> you can't use Array as a thing to route on any more than you can use Object

Oh, sure you can. You just need to use java.util.Arrays's methods for "equals" and "hashCode" (or potentially deepHashCode).

Also, ArrayList does work.

Yes, that is totally F'd up.

> The point is that in standard java objects with value semantics have stable hashCode implementations, and this is not true in general

Value semantics are pretty rare in Java. I'm not even sure what you really mean there. Pass-by-value is doesn't happen with Java objects. I could see you referring to immutable types, but that doesn't include any of the collection classes. I'm guessing you mean "primitive type wrappers and collections".

It'd make more sense if you said something like, "pretty much anything that overrides Object.equals(Object)"... because that's the way it is supposed to work. They are rare in standard class library, because there is little business logic there. In practice, anything that resembles an identifier, and therefore all keys, tends to do the override though. Indeed, most of what people tend to call "business objects" tend to do the override. That's why the convention is there. Most importantly: almost all overrides do so in a fashion that is stable across processes. That's also why distributed frameworks can and should employ the convention/protocol.

That equals/hashCode methods in Object are following the Smalltalk trick of having protocol defined in Object even though you shouldn't really use it without subclassing. The Object method isn't a "reference implementation" of the protocol, but rather a placeholder (one they forgot to override in Array objects, and then tried to backdoor in with java.util.Arrays).

lmm · on June 20, 2012

I mean value semantics in the standard sense, http://en.wikipedia.org/wiki/Value_semantics . You're correct that relatively few classes in the java standard library do so.

There is no convention that hashCode() should be stable across processes; the only thing that could reasonably define such a convention is the javadoc of Object#hashCode, which explicitly states otherwise. More pragmatically, there is an important set of objects, widely used in distributed systems, whose implementation of hashCode() is not stable across processes (namely protocol buffers objects). So distributed frameworks can't and shouldn't assume hashCode() is stable across processes.

cbsmith · on June 21, 2012

Okay, let's take this one by one:

Value semantics are a case where the identity shouldn't really exist at all, as all that matters is equivalence. There is a lot of room there in between where equivalence has meaning but identity can still play a role.

Arguably in Java only the native types (which aren't part of the Java standard library) really embody this, with their wrappers and String barely getting a nod, and Collections totally don't fit the bill. IIRC part of the language definition refers to the fact that all object have reference semantics.

Protocol Buffers are meant to be used like as a memento [http://en.wikipedia.org/wiki/Memento_pattern]. At most you should have a wrapper object providing behaviours like equivalence (in fact with Hadoop, I do this all the time with them, which is kind of a must anyway as Protocol Buffers don't implement Writable, let alone WritableComparable, etc.). Protocol Buffers very much don't have behaviour beyond serialization, which is exactly why they shouldn't have equivalence implementations. Consequently, if you are invoking equals(), hashCode(), etc. on them, "you are doing it wrong".

You're right though that there isn't a strict convention about hashCode() being stable across processes. It's more like a corollary that stems from implementing equals(Object) in terms of an object's equivalence. When you define equivalence for an object, you implement equals(Object) to represent that notion. Because of the contract that hashCode() has with equals(Object), the only cases where hashCode() shouldn't be stable across processes is where the objects actually aren't equivalent across processes.

This should fit quite well with the objectives of a distributed system. In particular, if you are defining something as a "key", it must have a very clear notion of equivalence for it, and in that context it not only should be stable across processes, it needs to be stable across the entire system. If you don't implement that logic in equals(Object), you're violating that contract. This means your hashCode() method needs to reflect that stability across processes. I can come up with some ways of screwing up hashCode() in that context while still avoiding breaking the contracts, but they're all contrived and in practice I've never seen anyone actually do that.

Seriously, if you are running in to these issues, you have bigger problems in the design.

cbsmith · on June 22, 2012

> Arguably in Java only the native types (which aren't part of the Java standard library) really embody this

To be clear: "embody this" meant "Value Semantics", not the in-between space.