NodeJS Cluster

wzrdtales

@baris That someone is probably me
My PR actually was a bit more, this includes stuff like interpreting the original IP Address in reverse proxy constellations (which nearly any modern constellation consists of, just putting nginx in front of it is just this).

So to everyone:

I hardly suggest you to not try to use cluster on nodebb. You will run into trouble, b/c of the mentioned sticky session problem and some more stuff. And you probably won't have all your nodes that you scale across set as record on your domain, but a loadbalancer in front of it. Actually one of the biggest problems with NodeBB is currently still how you manage to keep it scalable.

So what have I gone for? Actually I go the same way for node, that I do go for all what I do which needs huge scale. I have abandoned cluster from node.js completely and am not using it anymore though. I specifically design my applications to be scalable without the cluster module, which is just often a design approach.

So how do I scale today?

The answer is Docker + Rancher and sometimes I also use dokku, but just for single node applications or dev environments. The problem with NodeBB is that it does not support Docker, and that is a bit sad. I hope they put some effort in it, to give the users a seamless experience, which also boosts your ability to easily scale nodebb.

And where is the sticky session handled? Actually at the load balancer level, unfortunately you need sticky sessions for NodeBB, things are easier if you don't need to rely on such stuff though. The load balancer might be HAProxy, NGINX or Traefik and you need to share the data about your sticky sessions between your load balancers if you have more than just one. But that is not that big of a problem though.

How does my Dockerfile look like?
Well I do have two versions: A filesystem based and an environment config based.

What is the difference and why?

So first of all the filesystem based, this one is actually just to have a version that is easy to setup, you need however to do a bit of a workaround to let NodeBB open the setup for you. Finally the entrypoint looks like this:

CMD npm install && node app --setup --config=hostConfig/config.json && node app.js --config=hostConfig/config.json && cp config.json hostConfig/config.json

Do I suggest to use this?
Nope, definitely not, you should rely on environment variables instead.

And as suggested this is the second method: Configure via environment variables instead. See https://community.nodebb.org/topic/4325/set-up-nodebb-via-environment-variables

Maybe the NodeBB Team puts some work in a good docker production setup that is recommended for users, that is going to make many things easier, the rest is up to the design of NodeBB itself, in how easy it scale or which culprits they introduce. Currently the only two I know of are actually the sticky session stuff and the other one is about plugins. As soon as you scale across independent nodes, this gets a some kind of a problem though. I'm actually not sure if independent node_modules folders do work as of now, what I can say is that my nodes do share the node_modules folder across all other nodes and that I tell rancher to restart the container, after a plugin upgrade or install, one by one.

And how to upgrade?

Well, just do it... Actually you need a new version of the docker container, currently you need to do this all by yourself, NodeBB does not build any containers that are really usable for production right now. Next you initiate the upgrade, as soon as you have your containers you need to tell one container to execute nodebb upgrade, after that tell rancher the upgrade is finished and it will switch over to the new ones. I did managed to have zero downtime deployments through this, also I'm not sure how safe nodebb handles their releases for zero downtime deployments, if they introduce something new e.g. in the datastructure of the data- base/store that is not backwards compatible (if you do zero downtime you need to stretch stuff like this over at least two iterations though), that would actually naturally crash the old application.

That are however my experiences about that topic

wzrdtales

Btw. what @julian mentioned is actually the problem you need to fix with layer 4 information, that was actually the reason why I made that PR by that time, as a POC of how one would need to use the cluster module to actually be able to always send the socket to the right target. To give a bit of light into this: I do have a module for socket-io and cluster in general that does exactly this. But currently there are some bugs in it b/c there are several problems that lay in the node core itself, which for example makes it impossible to let this plugin currently work properly when the request gets to big, without creating a new socket and passing data around two sockets.

All in all, if you see the cluster module of node.js it is only really useful for some edge cases. Those edge cases are very rare though, what is actually missing to make the cluster module really useful would be a native SHM provided from the core of node.js. For everyone that does not know what SHM means, just search for shared memory.

zoharm

Yes edge cases are rare! Thank you for all the input!! And yes of course the NodeBB team can't spend their time on every little thing. I'm just glad to have this discussion!

Florian Müller

Good morning.

We stumbled across this problem as well some time ago, and found a good solution for that.

Our system consists of an nginx routing server as a gatekeeper, a mesos environment for running multiple containers and an ha-proxy to access these containers easily. So basically we're enforced to run nodebb in single instances, and we can't use sticky sessions because the containers change all the time.

Our solution: All nodebb-instances share the same redis session store, and we disabled long-polling transport for socket.io -> works like a charm. No need to hassle around with any kind of clustering or request routing.

I don't see any problems by disabling long-polling requests (the even messed up our statistics), because every modern browser supports websockets. And they are working very well. Maybe I've overseen something here?

Best,
Flo

wzrdtales

@Florian-Müller said in NodeJS Cluster:

I don't see any problems by disabling long-polling requests (the even messed up our statistics), because every modern browser supports websockets. And they are working very well. Maybe I've overseen something here?

No you're not that wrong. Long Polling is as of today something that you don't necessarily need to just give the best support of your service to your users. But you need it for environments that do not support websockets yet. This includes many DDoS solutions and any other reverse proxy constellation that is not yet up to date. A perfect example was cloudflare. Cloudflare today has support for websockets now. But you naturally need to get on a paid plan for actually being able to use them. Unlike if you use long polling, websockets are not really usable on the free plan. The free plan only includes support that is that limited that you actually just can test around with it. So this is a point where it makes sense to have support for long polling in a software like this. Not for the end users, but for the forum owners.

Actually the problem with socket.io here is the handshake and I wonder how you resolved this? For the socket.io handshake you need to stick to the same container for 2 requests. So how did you resolved this? I have not looked a long time into socket.io, as I use as of today only modules that are build for performance like the ws or uws module, do they finally have shared state support?

Florian Müller

You're right, websockets don't work through akamai (we're using akamai) yet. So we're using a separate subdomain for the websocket-connections pointing to the datacenter directly. The page itself is protected, and the websockets are accepted by an nginx where we can use stuff like rate limiting.

I'm not sure wheres the difference in the handshake between long polling and websockets, but websockets simply work with one single request, as long as the session is available on all instances - via redis in our case. I guess the handshake (auth) and the connection upgrade happen in the same request.

wzrdtales

@Florian-Müller said in NodeJS Cluster:

You're right, websockets don't work through akamai (we're using akamai) yet. So we're using a separate subdomain for the websocket-connections pointing to the datacenter directly. The page itself is protected, and the websockets are accepted by an nginx where we can use stuff like rate limiting.

I hope you're pointing to an IP that is not any near in the range of the IP of your services, or even the same which would be worse, otherwise if you're using AKAMAI as DDoS solution, you just build a information disclosure vulnerability by design. Unless you're hosting at level 3 or OVH or any other provider smilarily capable like them, it is pretty unlikely that your hoster has protection for this by itself.

I'm not sure wheres the difference in the handshake between long polling and websockets, but websockets simply work with one single request, as long as the session is available on all instances - via redis in our case. I guess the handshake (auth) and the connection upgrade happen in the same request.

It has nothing to do with long polling. You can do long polling without a handshake. socket.io does make the handshake, not any of the protocols that socket.io can use

wzrdtales

And about AKAMAI, yes they're pretty outdated... I needed to work with them in the past (just a few month ago though) and I really wondered about how slow they are in development. They are far behind their competitors. Especially cloudflare though, they even don't have h/2 push yet...

Florian Müller

We're using google cloud as our hoster, so we should be safe here
Even the routing nginx for the websockets is separated from the other routing systems.

I'm not a developer, so I don't have a real clue how it works. When we used long-polling and websockets in the beginning, nodebb tried long-polling first, ending up in endless loops because the handshakes failed. This problem disappeared immediately when we removed long-polling from the transports. In my web console I can see a single websocket-request in pending state, with frames being sent over.

wzrdtales

@Florian-Müller said in NodeJS Cluster:

We're using google cloud as out hoster, so we should be safe here
Even the routing nginx for the websockets is separated from the other routing systems.

Ok, can't tell if they include DDoS protection automatically, may be you should ask them for that or wait for the first attack to happen .

I'm not a developer, so I don't have a real clue how it works. When we used long-polling and websockets in the beginning, nodebb tried long-polling first, ending up in endless loops because the handshakes failed. This problem disappeared immediately when we removed long-polling from the transports. In my web console I can see a single websocket-request in pending state, with frames being sent over.

Good to know

julian

@Florian-Müller Yes, typically we recommend using sticky cookies or ip_hash in nginx to segment users into different buckets, one per NodeBB process.

If you go with websockets only, then of course, no problems... except for those users running IE8 :shipit:

Florian Müller

@julian Company decided that we are not going to support IE8 anymore, hurray!

So, Websocket connections are not bound to a specific process, als long as the session data is available everywhere? One single request as I assumed above?

zoharm

afaik, and please correct me if I'm wrong, socket.io does rely on a longer handshake process (vs for example handling handshaking yourself using ws.) But the socket.io-redis module takes care of centralizing the handshake metadata? (besides centralizing the pub-sub feature of socket.io which is what it does for sure)

wzrdtales

@zoharm said in NodeJS Cluster:

afaik, and please correct me if I'm wrong, socket.io does rely on a longer handshake process (vs for example handling handshaking yourself using ws.) But the socket.io-redis module takes care of centralizing the handshake metadata? (besides centralizing the pub-sub feature of socket.io which is what it does for sure)

No, it is just an adapter that enables you to send messages between different processes running socket.io all using this same adapter. It does not centralize the handshake.

zoharm

@wzrdtales said in NodeJS Cluster:

No, it is just an adapter that enables you to send messages between different processes running socket.io all using this same adapter. It does not centralize the handshake.

Just one more question if possible, wouldn't the adapter also take care of telling all other processes subscribed to the same adapter to emit a message (and thus to all clients connected to those processes?) Ala:

io.to('room').emit('event', data):

Thank you!

wzrdtales

@zoharm said in NodeJS Cluster:

Just one more question if possible, wouldn't the adapter also take care of telling all other processes subscribed to the same adapter to emit a message (and thus to all clients connected to those processes?) Ala:

To citate: "enables you to send messages between different processes running socket.io all using this same adapter.". So yes, this is basically the whole functionality of that adapter. They will fetch messages from the central storage and send them to the target, if the target is connected to them. Or in case of groups ("rooms"), send it to all in that group connected to them.