Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that even in CP mode, committed messages can be dropped as well (one of the pitfalls I assume you've experienced) when the cluster heals from partition. cf link in my sibling comment.


I suspect what happened there was to do with http://www.rabbitmq.com/ha.html#unsynchronised-slaves

When using mirrored queues, Rabbit does ensure all the active mirrors are written to before confirming a publish:

    "in the case of publisher confirms, a message will only be confirmed to the publisher when it has been accepted by all of the mirrors"
So if my understanding is correct, wiping the contents of a re-joining mirror shouldn't matter, since no new messages should have been accepted since the partition (unless the "pause" part of pause-minority is only happening after other things like re-election or dropping "dead" slaves, in which case yes pause-minority is useless - this seems doubtful, however).

Hence why I think the problem is synchronized slaves.

Basically, when a slave is created (eg. in response to another slave dying), it only receives NEW messages, not existing messages. So suppose the following sequence of events on a 2-mirror queue:

    Publish A
    Master and slave both contain A
    Slave dies
    New slave created
    Master contains A; Slave contains nothing
    Publish B
    Master contains A,B ; Slave contains B
    Master dies
    Slave promoted and new slave created
    A is lost
The way around this is with setting the policy "ha-sync-mode": "automatic". In which case the act of creating a new slave also replicates the current contents of the master. To the best of my knowledge, if the same Call Me Maybe tests were run with that policy in place, no messages should be lost.

But yes, this is precisely what I meant by "fraught with pitfalls". The pause while messages replicate can be disastrous on its own if the queue is large, another issue that has bit me in production.

I do love RabbitMQ but I wish there was a good, planned-from-the-beginning as clustered CP AMQP broker out there. Maybe I'll try to write one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: