Hacker Newsnew | past | comments | ask | show | jobs | submit | JoshMock's commentslogin

Protobuf is a great format with a lot of benefits, but it's missing one that I wish it could support: zero-copy. The ability to transport data between processes, services and languages with effectively zero time spent on serialization and deserialization.

It appears possible in some cases but it's not universally the case. Which means that similar binary transport formats that do support zero-copy, like Cap'n Proto, offer most or all of the perks described in this post, with the addition of ensuring that serialization and deserialization are not a bottleneck when passing data between processes.


It depends how you actually use the messages. Zero-copy can be slowing things down. Copying within L1 cache is ~free, but operating on needlessly dynamic or suboptimal data structures can add overheads everywhere they're used.

To actually literally avoid any copying, you'd have to directly use the messages in their on-the-wire format as your in-memory data representation. If you have to read them many times, the extra cost of dynamic getters can add up (the format may cost you extra pointer chasing, unnecessary dynamic offsets, redundant validation checks and conditional fallbacks for defaults, even if the wire format is relatively static and uncompressed). It can also be limiting, especially if you need to mutate variable-length data (it's easy to serialize when only appending).

In practice, you'll probably copy data once from your preferred in-memory data structures to the messages when constructing them. When you need to read messages multiple times at the receiving end, or merge with some other data, you'll probably copy them into dedicated native data structs too.

If you change the problem from zero-copy to one-copy, it opens up many other possibilities for optimization of (de)serialization, and doesn't keep your program tightly coupled to the serialization framework.


I don’t understand this argument. It seems to originate from capnp’s marketing. Capnp is great, but the fact that protobuf can’t do zero copy should be more an academic issue than practical. Applications that want to use a schema always needs their own native types that serialize and deserialize from binary formats. For protobuf you either bring your own or use the generated type. For capnp you have to bring your own. So a fair comparison of serialization cost would compare:

native > pb binary > native

vs

native > capnp binary > native

If you benchmark this, the two formats are very close. Exact perf depends on payload. Additionally, one could write their own protobuf serializer with protoc they really need to.


Is that a format/serialization issue, or library/implementation issue?


Serialization issue. From the Introduction to Cap’n Proto:

"Cap’n Proto is INFINITY TIMES faster than Protocol Buffers. (...) there is no encoding/decoding step. The Cap’n Proto encoding is appropriate both as a data interchange format and an in-memory representation, so once your structure is built, you can simply write the bytes straight out".

I take it as a rationalization of what OLE Compound File Binary - internal Microsoft Office memory structures serialized "raw" as file format - would look like if they paid more attention to being backward and forward compatible and extensible.


Google has a library/format for that too, with FlatBuffers. Different use cases and advantages really, not clearly better/worse.


Kenton Varda also worked on Protobufs at Google before he wrote CapnProto, I think.



Indeed. An excellent alternative to Bluebird's implementation.


I know an update isn't available for OS X yet, but did anyone else get notified of an upgrade of their OS X version? The most recent OS X version of the Flux.app file says it was created Oct 4, 2013. I'm pretty sure this update has caused my Macbook Air to hang momentarily when trying to put it to sleep. Kind of annoying.


ISP: Comcast Location: Nashville, TN


I prefer the /etc/hosts hack, putting in entries like `127.0.0.1 twitter.com`. That way, when I actually want to visit Twitter, I have to `sudo vim /etc/hosts`, enter my password, comment out the host entry, save, go to Twitter, and then uncomment the entry, save and quit when I'm ready to block it again.

The inconvenience of the action is what makes it work. If I only had to type `workmode stop` I'd have that committed to muscle memory by the end of a work day or two.


I was thinking the same thing. Given I use Evernote to save pretty much everything, why would I need a separate service that does the same thing?


I'll be 100% behind this project when it has as much control over a device as Tasker does. Recipes in JavaScript and not having to "program" on the device itself are exactly the kinds of things I wish Tasker could do. The modeOfTransport monitor is pretty attractive too.

Beyond all that, I'm curious about on{X}'s battery usage. That's a huge selling point for always-on processes like this.


Great article. This expresses better than I ever could have at age 20 why I quit my CS degree since I already had a job. At the time, the university was experimenting with a "software engineering" degree, but it was new and not yet accredited so it was too early to know if it was worth it.


Lately I've been re-discovering the value of the ORM and putting as much business logic on your models as possible. This, after spending way too long writing out lots of query logic in views instead. It's amazing how you can many times reduce complexity from 10-20 lines to 2-3 lines, and gain reusability, just by putting business logic where it belonged in the first place.


    > putting business logic where it belonged in the first place
I've been thinking a lot about where to put business logic, and I think the models are the closest, but not the best place for them.

A significant portion of this logic for me affects more than one model, and while you could solve this by using public interfaces, you still need to put in any of the models, which seems like a suboptimal approach.

The ORM is already responsible for several layers, and custom business logic is not relevant there imho.

Still puzzled how to organize my code. MVC has started to fall apart for me. Just my two cents.


"What Wordpress did for the web" is a dangerous comparison.


I think if you understand the dangers, you also understand the point they are trying to make.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: