How to route messages from one node to another? - /g/ (#106127445) [Archived: 431 hours ago]

Anonymous
8/3/2025, 5:44:59 PM No.106127445
WebSocket
WebSocket
md5: f58c9c5cb23590214d8c19bbfbc80bbd🔍
So I have a server, to which users establish a persistent connection to using WebSocket. Occasionally, some user needs to be notified of an action taken by another user, so if the server sees that a user A sent some message on that WS connection that represents such action, it immediately checks for the users that need to be notified and sends the info to them on their WS connections, if they're connected at the time. So now I want to add the ability for the server to run in several instances. The problem is that, if a user A is connected to node A and the user B is connected to node B, the node A wouldn't know that user B is online at all because it can't see a WS connection from user B in its own memory. Clearly, some kind of coordination is necessary. The issue is this: I can't figure out how to do it. OK, say we have a centralized data store that stored user-to-node mappings. Now what? I could put an Apache Kafka and create a topic for each node, but what if some node X crashes and some other node Y checks the data store for the user-to-node mapping before the TTL for the mapping expires, then sends the activity notification into the topic for the already-dead node, and then the user that was on the crashed node reconnects to another node? At this point the notification is pretty much lost. How does one design this properly?
Replies: >>106127716 >>106127730 >>106130227
Anonymous
8/3/2025, 6:17:51 PM No.106127716
>>106127445 (OP)
the complexity will depend of your scale.

small scale just use redis pubsub or something similar.

bigger scale you could use something like that :
https://github.com/heroiclabs/nakama

if you want to build it yourself and really scale then you'll have to implement either some p2p where users talk to each others without going through you, or some gossip protocol between your servers.

you could also do bucketing.


ie you have your user id, you hash it, based on the hash you choose which node will handle it.

so let's say user A wants to send to user B
because he's user A he'll talk to any node.

the node will see that he's user A and that it must be relayed to node B that can talk to user B.

you could also send the data through push notifications and whatnot.

in the case you go with the bucket technique you may want to separate it in ranges so that it's easier to dynamically add nodes.


ie you hash your user, get a number between 1 and 10k and you give each of your nodes a range of 10k / amount of nodes.

this way if you add a new node, nodes will not change most of their range but lose like 10%
Replies: >>106127730 >>106127754 >>106127767 >>106127786
Anonymous
8/3/2025, 6:18:49 PM No.106127730
>>106127445 (OP)
>>106127716
also this may be relevant to you:
https://medevel.com/26-os-chat-servers/
Replies: >>106127754
Anonymous
8/3/2025, 6:21:25 PM No.106127754
>>106127716
>>106127730
also gossip will be more resilient in case a node crash but also more costly.

you are kind of playing with the CAP theorem here.

with gossip, both clients connect to whatever server.

when on users sends a message, your servers uses the protocol to send it to each other in an efficient way.

ie node A sends to nod B and C

B sends to D and E
C sends to F and G

etc...

then once the node received the message it'd check if it has a websocket connection for that user, and if yes relay the message.
Replies: >>106127767
Anonymous
8/3/2025, 6:23:06 PM No.106127767
>>106127754
>>106127716
lastly you can do a hybrid method

where you first try bucketing, which will be less network traffic.

and if it fails because the target node is down or whatever then you use gossip.

anyway, my bets are you'll not need that kind of architecture for a WHILE and pubsub or similar solutions will serve you just fine.

especially since there are many systems that offer pubsub like functionality and scale horizontally.

they'll internally do what i described but it's abstracted away and you don't have to bother with it.
Anonymous
8/3/2025, 6:24:46 PM No.106127786
>>106127716
>small scale just use redis pubsub or something similar
Does that prevent activity notification loss?
>ie you have your user id, you hash it, based on the hash you choose which node will handle it
I thought about that, but I can't really control which node each user connects to, can I? The load balancers just assign connections to random nodes
>gossip protocol
Does it really scale? Every node would need to read every notification/message, even if it doesn't and never had a WS connection with the recipient user, no?
Replies: >>106127851 >>106127883 >>106127901
Anonymous
8/3/2025, 6:30:06 PM No.106127851
>>106127786
>Does that prevent activity notification loss?
redis does not guarantee delivery (ie there is a crash etc) and it'll deliver only once.

either you work around it with your database (you will want to store messages anyway).

or you use something else like rabbitmq.
Anonymous
8/3/2025, 6:33:19 PM No.106127883
>>106127786
> I thought about that, but I can't really control which node each user connects to, can I? The load balancers just assign connections to random nodes

you can manually choose but it'll increase complexity on the application logic, that's why most don't do it as it's not worth it.

> Does it really scale?

yes and no, gossip can be more efficient, ie you can target a specific node.
maybe you should use something like kademlia to know which to target.
ie at all time you have a distributed hash table that link userid to which node they are connected to.
this way with your dht you can know which node to relay the message to directly.
Replies: >>106127901 >>106129190
Anonymous
8/3/2025, 6:34:52 PM No.106127901
>>106127883
>>106127786
anyway, i don't think you'll need those kind of scale any time soon, if you want to ship something be effective with your time.

i'd just use something that already does it for me if i want to build something.

if your thing end up growing big you'll have plenty of time and money to think about scaling things up.
Replies: >>106128088
Anonymous
8/3/2025, 6:54:37 PM No.106128088
>>106127901
Is it really not solvable with just some clever design? I feel like it's almost solved with the mappings + queues approach, it's just that there's a possibility of notification loss and of out of order delivery on node crash that I can't figure out how to fix
Replies: >>106128124
Anonymous
8/3/2025, 6:57:52 PM No.106128124
>>106128088
yes it is, i explained various ways, there are various other ways to solve it.

that's why you have to store them and check that they have been successfuly delivered.

but anyway, you should look at the CAP theorem, you are not gonna beat physics.

you can have a AP system, or a CA system, or even a CP system.

but you can't have a CAP system, you'll have to make trade offs, it really depends on what you want.
Replies: >>106128164
Anonymous
8/3/2025, 7:03:09 PM No.106128164
>>106128124
I don't understand what CAP theorem has to do with this, the mappings data store is CP, but I don't think I have reads or writes in my server
Replies: >>106128480
Anonymous
8/3/2025, 7:38:21 PM No.106128480
>>106128164
a notification also count as a rw even if you do not store it long term, it is still "stored" in transit.

you are trying to process data in a distributed way.

you want consistency (ie order of messages are distributed)

you want availability and you want partition tolerence.

even if your goal isn't to store it you are still limited by the cap theorem.
Anonymous
8/3/2025, 8:49:18 PM No.106129190
>>106127883
Would it work to just store the user ID -> node ID map in a distributed data store, then just use gRPC to communicate between nodes directly or is there some pitfall that I'm not seeing?
Replies: >>106130156 >>106130184
Anonymous
8/3/2025, 10:20:13 PM No.106130156
>>106129190
you can do that, you now have a CA system.
Anonymous
8/3/2025, 10:21:36 PM No.106130184
>>106129190
you can do that but now if the target node fails it doesn't work for a whole range of user id.
Anonymous
8/3/2025, 10:26:18 PM No.106130227
>>106127445 (OP)
actually can you tell us what you really want to build because all solutions will have trade offs and the best really depend on what you want.