Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ruby: shallow copy surprise! (thingsaaronmade.com)
10 points by aarongough on May 20, 2010 | hide | past | favorite | 24 comments


This is a lot of text and boilerplate for a very, very simple idea:

    >> x = [1,2,3]
    => [1, 2, 3]
    >> def go(z);   z[1] = 0;   end
    => nil
    >> go(x)
    => 0
    >> x
    => [1, 0, 3]
Mutable objects in Ruby are passed by reference, not value.


It ended up being longer than I had planned.. The main reason being that I wanted to demonstrate the fact that Object.clone doesn't work as I would have anticipated, and that I wanted to demonstrate a possible solution.

Greg Brown was kind enough to steer me onto the right track though, this is mainly a design problem. I should be avoiding copying wherever possible.


The Io language has a similar gotcha - when I cloned one of my prototypes that contained a List member, I was surprised to find that all the clones shared the same List! It actually makes sense - it was the reference to the list that was copied, but all the references pointed to the same list.

The problem with always defaulting to deep copy in a language where all object slots are by reference is "where do you stop?" Do you copy all the objects known to that object? If the object holds a data file or a resource, do you deep copy that too? What about object graphs? What about two objects mutually holding references to each other? What if the object holds a reference to the global application object?

So the most common way to do it is default to only a shallow copy. It's up to the user to define a deep copy if they need it because only the user knows what members are semantically "part of" the parent object and what are "pointers" to unowned objects.


I ran into the exact same problem when trying to solve some (not very complicated in retrospect) errors in a body of text. Shallow copying a string that I was applying permutations to and storing in an array meant that every single string in the array was the same, so I could never have more than one permutation without pulling some hacks like this.

I would be very interested in a cleaner variant on the marshal load hack for non-primitives, or even some interesting doc/writeup on how this works.


Are you kidding me? The "solution" is to serialize and deserialize? That's a incredible waste.

edit: I thought it was not uncommon for higher level languages to pass arrays and objects by reference, so this post wasn't particularly new or interesting. Unless you're coming from PHP, which is, IMO, a nightmare because everything is passed by value (by default) except objects.


I have actually come to Ruby from PHP, maybe that's where I got my incorrect assumptions from, it definitely wouldn't be the first bad habit PHP has left me with.


I'm actually looking for feedback on this. I know there are quite a few Ruby hackers hanging round HN.

Is there something I'm missing here? Some idiom that lets you side-step this problem? I'm questioning it because this problem/solution seems very much at-odds with the elegance and thoroughness of the rest of the language.


You're missing a few details, especially around "primitive" data types. There's not really any such thing in Ruby - everything is an object, but some objects - numbers, booleans, nil - are immutable.

When you copy an object, all you do is copy its set of instance variables, which are just references to other objects. For an array, the instance variables are its set of indexes, which again are just references. Copying an array just means making a new list of references, but the objects they point to remain unmodified and uncopied.

Consider:

<pre> array = ["foo"] copy = array.dup </pre>

array and copy are independently mutable - modifying the index in one does not affect the indexes in the other - but they still both contain references to the single string "foo". Thus:

<pre> copy.first.gsub! /foo/, "bar" </pre>

modifies the string referenced by copy, which is the same string referenced by array. So array becomes ["bar"].

If you want a true deep copy, do something like this:

<pre> def deep_copy(object) case object when Array object.map { |item| deep_copy(item) } when Hash object.inject({}) do |hash, (key,value)| hash[deep_copy(key)] = deep_copy(value) hash end # handle other data structures if need be else object.respond_to?(:dup) ? object.dup : object end end </pre>


Apologies for the formatting screw-up. The deep_copy function is here:

http://gist.github.com/407741


No worries, thanks for taking the time to write it out. I'm glad that I wrote the post (and that I'm getting hammered a little for my assumptions) because making mistakes is probably the only way I'm going to get a deeper understanding of the language...

One thing that confused the issue a little for me is the fact that some objects in Ruby are actually only really 'pretend objects'. ie:

  >> test = 4
  => 4
  >> test2 = 4
  => 4
  >> test.object_id
  => 9
  >> test2.object_id
  => 9
I don't know enough about the deeper parts of the language to know what else there is that's like this though...


This is a performance optimization for common (read: integer) numbers: http://ruby-doc.org/core/classes/Fixnum.html

  >> ((1 << 30) - 1).class
  => Fixnum
  >> ((1 << 30)).class
  => Bignum

  >> ((1 << 30) - 1).object_id
  => 2147483647
  >> ((1 << 30) - 1).object_id
  => 2147483647

  >> (1<<30).object_id
  => 166070
  >> (1<<30).object_id
  => 161200



The thing is that neither of those actually solve the problem. They were the first thing that I tried...


Sorry, I wasn't trying to say they solved the problem. They say exactly what they do and don't do.


He referenced not being alone in his assumptions, but this is basic ruby:

   >> a = b = [1,2]
   => [1, 2]
   >> c = a.dup
   => [1, 2]
   >> a==b
   => true
   >> a==c
   => true
   >> a.equal?(b)
   => true
   >> a.equal?(c)
   => false
Here's a better post on the topic: http://kentreis.wordpress.com/2007/02/08/identity-and-equali...


Well in fairness, I did mention that the realization made me feel stupid :-p

It's not necessarily obvious if you're coming from other languages that don't behave this way. That being said I'm surprised that I had never run into this problem before. I think that most of the time I had the right idea with not copying objects, but in this case I had memoized a method call and the Hash 'cache' was getting corrupted which was what brought it to my attention... A slightly more unusual situation.


This is not a surprise. I have the same issues with Strings too. A String is passed around your app and someone changes it - capitalizes/chomps etc. The String is changed throughout the app ! You have to dup() it, if you want to ensure no one changes it.

This means if you have classes returning Strings, such as first_name, last_name, address etc, your getter should return a dup() if you want to ensure no accidental change to it. That sucks, if you ask me.


capitalize, chomp are non-destructive contrary to their "bang" counterparts: capitalize!, chomp!

Lots of methods are non-destructive and it's cleaner to use them instead of artificially calling dup().


I cannot remember the exact cases, but the point is that you have an API on one hand, and the user of an API on the other.

The API returns strings to you, the user at some point needs to (say) perform multiple operations on that String. Say, multiple gsubs. So rather than create a new string with each, he uses a gsub!.

I've actually once had a discussion about this on ruby-forum when i faced this issue. We talked of a copy-on-write string. But i did not want to change my entire application.

It is inefficient for the API to keep returning dup()'ed strings. otoh, if the user accidentally changes the string (which she can), your API can throw an error or malfunction.


Just to reinforce the other comments - this is pretty much expected behaviour in high level languages.

Always read the docs.


I actually just had a chat via email with Greg Brown (author of Ruby Best Practices). I updated the blog post to get to the point quicker and I included his take on the matter...


As a side note this one of those fundamental language problems that Clojure solves without sacrificing performance.


Great read, although I've never been bitten by this particular "bug".


It's not a bug ... the references are copied by value (don't know what the big deal is ... it's the same thing happening in Java), the cloning done on the basic collections is shallow (again, same thing happening in many other languages) and the basic types like Fixnum are immutable.

When learning a new language, after playing around with code-snippets I then usually read the language's reference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: