Redis Sentinel beta released

Uchikoma · on July 23, 2012

Redis has been working here on a high traffic site without any trouble for more than a year. Excellent software. The only software I might think that's bug free, due to the attitude of @antirez.

xal · on July 23, 2012

This thread will likely get a million "us too" style posts but Redis is core infrastructure at Shopify. It's been so solid that we recently waived our defensive requirements that the app remains working if Redis goes down. This allowed us to port our inventory reservation system (a huge point of potential lock contention) completely to the new server sided LUA scripts. We have seen a full order of magnitude speed increase from this. A reservation for a complicated order is now measured in µs instead of ms.

thezilch · on July 23, 2012

I'll put "us too" here, as we too now assume Redis to be up like our relational store(s) -- Percona MySQL. ~16GB at ~500 ops/s averaged over the last 3 months. We've been running Redis for core features for more than 21 months, and as a store, it has been the most stable and easiest to reason with for what performance we can expect and actually receive. Compared with "non-SQL" setups we deploy, if we were to start from scratch, we'd look to replace ActiveMQ, Solr, and a number of our jobs that we jam through MySQL.

Bravo to antirez and the Redis team.

primitur · on July 23, 2012

I love it that you're doing it in Lua. Can you tell us more about it?

wahnfrieden · on July 23, 2012

Redis now lets you write extensions in Lua so it's not unusual. It would be interesting to hear more about though.

seunosewa · on July 23, 2012

Which site is that?

Uchikoma · on July 24, 2012

brands4friends, No 1 Shopping Club in Germany, >4M members.

spenrose · on July 23, 2012

Hey Antirez, the link to hires in the announcement is broken. Replace "anirez" with "antirez".

antirez · on July 23, 2012

thanks, fixed now.

_urga · on July 23, 2012

Thanks Antirez, could you share more insight on TILT mode? Any other alternative approaches you considered? Why use a value of 30 seconds to leave TILT mode? If the time has shifted is it likely ever to be correct thereafter?

antirez · on July 23, 2012

Hello, basically 30 seconds is exit time only if no other time shifts are detected, otherwise we set again the exit to now+30sec, and so forth.

30 seconds is set as 3 times the biggest period we have in info collection (INFO itself is sampled every 10 seconds, while PING is every 1 second), so that if there was a problem with the timer, in 30 seconds we are sure the new state will get new readings for every kind of request and information we collect, so when the TILT mode is exited, and the function to evaluate the state is called again, it should see clean values.

Note that from the point of view of Sentinel it is ok that the new time is wrong compared to what the real time is, we never use the absolute time. All we need is that we have a computer clock that more or less advances regularly.

zerop · on July 23, 2012

I think this can also be used as monitoring & notification for other services as well in cluster & not just Redis instances !!

mryan · on July 23, 2012

I imagine there is some overlap in parts of the logic for Sentinel and repmgr (a similar tool for PostgreSQL). For example, checking to see if members of the cluster are in-service, and choosing a new master in the event of a failover.

I would love to see a generic tool for handing the clustering/failover problem.

antirez · on July 23, 2012

It's true there is some overlap, but also Sentinel uses things that are specific of Redis. For instance for us two things are crucial:

1) The ability to use the master as a message bus to auto-discover things. This is possible because every Redis instance is also a Pub/Sub server.

2) The idea that after every restart of every Redis instance we have a "runid" that changes.

And in general the logic of the failover itself, the fact that the failure detection is precise (some specific reply codes are considered in a way, some others in another way), makes a non specific solution much harder to implement with the "methods" to perform the service-specific tasks that may end to be complex, or sometimes forces to completely change the logic of the system (lack of Pub/Sub).

aphyr · on July 24, 2012

http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leader...

swah · on July 24, 2012

Can anyone comment if this criticism is valid?

"In which monitoring agents rely upon correct, truthful, answers about cluster state from the system they're monitoring"

Original: https://twitter.com/cscotta/status/227515068030537728

antirez · on July 24, 2012

It is not :) Just replied:

"@cscotta Hi, you misunderstood how it works: Pub/Sub is only used for discovery on startup. Sentinel-to-sentinel p2p for critical stuff."

Pub/Sub is used to make the configuration simpler when you start a Sentinel cluster at a cold time when everything is working and your master is ok.

This allows us to auto-discover the other Sentinels, to check the slaves, and so forth.

Instead in order to understand if a system is down, who is the Sentinel that performs the failover, and for all the critical stuff, Sentinel to Sentinel messages are used without caring if the master Pub/Sub works.

DRMacIver · on July 23, 2012

> a known bug in the hiredis library that can make Sentinel crash from time to time, but it's not a problem with Sentinel itself

Surely if a bug in a client can crash your server that's a bug in the server by definition even if the client is also buggy?

antirez · on July 23, 2012

Sorry I was not clear. Sentinel uses the hiredis C library itself in order to talk with other Redis instances. A bug in the C library crashes the library and the process it is running into.

DRMacIver · on July 23, 2012

Ah, I see. That makes sense. Thanks.

dvirsky · on July 23, 2012

I've written a little experimental python client that connects to a sentinel and keeps an image of the state of the monitored cluster as it changes. http://bit.ly/NNrQdI

jwuphysics · on July 23, 2012

The boldface isn't showing up correctly for me. Viewing on Chromium.

thallavajhula · on July 23, 2012

That sounds sweet. Congrats. Redis has been of great use to me.