More

FZambia · 2025-07-11T13:09:48 1752239388

Many here recommend using Kafka or RabbitMQ for real-time notifications. While these tools work well with a relatively stable, limited set of topics, they become costly and inefficient when dealing with a large number of dynamic subscribers, such as in a messaging app where users frequently come and go. In RabbitMQ, queue bindings are resource-intensive, and in Kafka, creating new subscriptions often triggers expensive rebalancing operations. I've seen a use case for a messenger app with 100k concurrent subscribers where developers used RabbitMQ and individual queues for each user. It worked at 60 CPU on Rabbit side during normal situation and during mass reconnections of users (due to some proxy reload in infra) – it took up to several minutes for users to reconnect. I suggested switching to https://github.com/centrifugal/centrifugo with Redis engine (combines PUB/SUB + Redis streams for individual queues) – and it went to 0.3 CPU on Redis side. Now the system serves about 2 million concurrent connections.

odie5533 · 2025-07-13T02:42:05 1752374525

I wonder who works on centrifugo. Could be anyone.

FZambia · 2025-07-11T12:48:34 1752238114

Client SDKs are often a major challenge in systems like these. In my experience, building SDKs on top of asynchronous protocols is particularly tricky. It's generally much easier to make the server-side part reliable. The complexity arises because SDKs must account for a wide range of usage patterns - and you are not controlling the usage.

Asynchronous protocols frequently result in callback-based or generator-style APIs on the client side, which are hard to implement safely and intuitively. For example, consider building a real-time SDK for something like NATS. Once a message arrives, you need to invoke a user-defined callback to handle it. At that point, you're faced with a design decision: either call the callback synchronously (which risks blocking the socket reading loop), or do it asynchronously (which raises issues like backpressure handling).

Also, SDKs are often developed by different people, each with their own design philosophy and coding style, leading to inconsistency and subtle bugs.

So this isn't only about NATS. Just last week, we ran into two critical bugs in two separate Kafka SDKs at work.

FZambia · 2025-07-11T12:27:06 1752236826

For real-time notifications, I believe Nats (https://nats.io) or Centrifugo (https://centrifugal.dev) are worth checking out these days. Messages may be delivered to those systems from PostgreSQL over replication protocol through Kafka as an intermediary buffer. Reliable real-time messaging comes with lots of complexities though, like late message delivery, duplicate message delivery. If the system can be built around at most once guarantees – can help to simplify the design dramatically. Depends on the use case of course, often both at least once and at most once should co-exist in one app.

cryptonector · 2025-07-11T15:53:36 1752249216

And Debezium.

FZambia · on Jan 16, 2025

Hi everyone!

I'd like to share that we've just released Centrifugo v6 - a major update of scalable WebSocket server. The release addresses some usability pain points and adds nice features and more observability.

Centrifugo is an open-source standalone server written in Go – https://github.com/centrifugal/centrifugo. Centrifugo can instantly deliver messages to application online users connected over supported transports (WebSocket, HTTP-streaming, Server-Sent Events (EventSource), GRPC, WebTransport). Centrifugo has the concept of a channel – so it's a user-facing PUB/SUB server. Everything implemented in a language-agnostic way – so Centrifugo can be used in combination with any frontend or backend stack.

These days we also provide Centrifugo PRO version – and trying to find a balance to be sustainable.

The server is based on the open-source Centrifuge library - https://github.com/centrifugal/centrifuge, so many improvements mentioned in Centrifugo v6 release blog post (even those for Centrifugo PRO) may be used just as a library in Go application.

We provide real-time SDKs for popular client environments – for browser and mobile development – they connect to both Centrifuge library based servers and Centrifugo server.

Generally Centrifugal ecosystem provides a good alternative to Socket.IO and cloud services like Pusher.com and Ably.com

Will be happy to answer on any questions

FZambia · on Dec 29, 2024

Yep, and in addition to that the ephemeral ports problem will araise at some scale with long-lived connections and infrastructure balancer/reverse proxy chain. So it's still required to tune.

FZambia · on Dec 29, 2024

Wow, it's fascinating how a single HN comment can drive meaningful traffic to a project! I'm the author of Centrifugo, and I appreciate you mentioning it here.

Let me share a bit more about Centrifugo transport choices. It’s not just about supporting multiple transports — developers can also choose between bidirectional and unidirectional communication models, depending on their needs.

For scenarios where stable subscriptions are required without sending data from the client to the server, Centrifugo seamlessly supports unidirectional transports like SSE, HTTP-streaming, unidirectional gRPC streams, and even unidirectional WebSockets (this may sound kinda funny for many I guess). This means integration is possible without relying on client-side SDKs.

However, Centrifugo truly shines in its bidirectional communication capabilities. Its primary transport is WebSocket – with JSON or Protobuf protocols, with SSE/HTTP-streaming fallbacks that are also bidirectional — an approach reminiscent of SockJS, but with more efficient implementation and no mandatory sticky sessions. Sticky sessions is an optimization in Centrifugo, not a requirement. It's worth noting that SSE only supports JSON format, since binary is not possible with it. This is where HTTP-streaming in conjuction with ReadableStream browser API can make much more sense!

I believe Centrifugo gives developers the flexibility to choose the transport and communication style that best fits their application's needs. And it scales good out of the box to many nodes – with the help of Redis or Nats brokers. Of course this all comes with limitations every abstraction brings.

FZambia · on March 20, 2024

Hello, I am author of https://github.com/centrifugal/centrifugo. Our users can choose from WebSocket, EventSource, WebTransport (experimental for now, but will definitely stabilize in the future). WebRTC is out of scope as the main purpose is central server based real-time json/binary messaging, and WebRTC makes things much more complex since it shines for peer-to-peer and rich media communications.

What I'd like to add is that Centrifugo also supports HTTP-streaming – not mentioned by the OP – but this is a transport which has advantages over Eventsource - like possibility to send POST body on initial request from web browser (with SSE you can not), it supports binary, and with Readable Streams browser API it's widely supported by modern browsers.

Another thing I'd like to mention about Centrifugo - it supports bidirectional WebSocket fallbacks with EventSource and HTTP-streaming, and does this without sticky sessions requirement in distributed scenario. I guess nobody else have this at this point. See https://centrifugal.dev/blog/2022/07/19/centrifugo-v4-releas.... Which solves one more practical concern. Sticky sessions is an optimization in Centrifugo case, not a requirement.

If you are interested in topic, we also have a post about WebSocket scalability - https://centrifugal.dev/blog/2020/11/12/scaling-websocket - it covers some design decisions made in Centrifugo.

FZambia · on July 18, 2023

Wondering whether coroutines may be a step towards async event-based style APIs without allocating read buffers for the entire connection. I.e. a solution to problems discussed in https://github.com/golang/go/issues/15735. Goroutines provide a great way to have non-blocking IO with synchronous code – but when it comes to effective memory management with many connections Go community tend to invent raw epoll implementations: https://www.freecodecamp.org/news/million-websockets-and-go-.... So my question here – can coroutines somehow bring new possibilities in terms of working with network connections?

FZambia · on Dec 22, 2021

Every time I read criticism of WebSockets it reminds me about WebSuckets (https://speakerdeck.com/3rdeden/websuckets) presentation :)

I am the author of Centrifugo server (https://github.com/centrifugal/centrifugo) - where the main protocol is WebSocket. Agree with many points in post – and if there is a chance to build sth without replacing stateless HTTP to persistent WebSocket (or EventSource, HTTP-streaming, raw TCP etc) – then definitely better to go without persistent connections.

But there are many tasks where WebSockets simply shine – by providing a better UX, providing a more interactive content, instant information/feedback. This is important to keep - even if underlying stack is complicated enough. Not every system need to scale to many machines (ex. multiplayer games with limited number of players), corporate apps not really struggle from massive reconnect scenarios (since number of concurrent users is pretty small), and so on. So WebSockets are definitely fine for certain scenarios IMO.

I described some problems with WebSockets Centrifugo solves in this blog post - https://centrifugal.dev/blog/2020/11/12/scaling-websocket. I don't want to say there are no problems, I want to say that WebSockets are fine in general and we can do some things to deal with things mentioned in the OP's post.

zemo · on Dec 22, 2021

> Not every system need to scale to many machines (ex. multiplayer games with limited number of players)

the author writes a websocket board game server. Most, if not all, of these complaints read like the author isn't partitioning the connections by game.

mathgladiator · on Dec 22, 2021

The question is at what level do you partition the connection. I could setup a server and then vend an IP to the clients. The problem with that strategy is how do you do recovery? Particularly in an environment where you treat machines like cattle.

If you don't vend an IP, then you need to build a load balancer of sorts to sit between the client and the game instance server. Alas, how do you find that game instance server? If you have a direct mapping, sticky header, or consistent routing. As long as you care about that server's state, it is the same as vending an IP to the client except you can now absorb DOS attacks and offload a bit of compute (like auth) to the load balancer fleet.

The hard problem is how much do you care about that server's lifetime? Well, we shouldn't because individual servers are cattle, and we can solve some of the cattle problems by having a nice shutdown to migrate state and traffic away. This will help for operator induced events with kindness. Machine failures, kernel upgrades, and other such things that affect the host may have other opinions.

FZambia · on Feb 6, 2021

Hi! Could you please extend the meaning of word "global-scale" a bit? Does this only mean that users will connect to the nearest server or there are more tricks on backend to scale PUB/SUB? You are writing that "messages are optimized for speed across our global network" – could you also elaborate more what do you mean saying this?

tlackemann · on Feb 8, 2021

Great question and something we can certainly make more clear.

Scaling pub/sub usually requires some time and maintenance, especially if you're going to have multiple pub/sub instances running in different datacenters. Normally this setup would involve a parent-child relationship where you have one master server that communicates to all others. Scaling this and making it fast across the world can be challenging, especially for development teams who are looking to make real-time products outside their country's origin with limited resources.

Essentially socket.ly has this all preconfigured behind-the-scenes. Optimized for speed means that we automatically take care of the nitty-gritty like compressing and choosing the right servers to send events to depending on who's connected to which servers and sharing which information. It's a bit of balancing act which is why we've made it into a service that any team who uses socket.io can utilize.

Thanks for the question, I hope that helps. Always happy to answer more.