Infinite loop
-
On what.thedailywtf.com, we've been having trouble with the NodeJS instances stopping handling requests and running at near 100% CPU usage until they are restarted. We run MongoDB for the main database, Redis for clustering support, and 2 NodeBB clustered instances (down from 4 due to this problem).
At this point, I have no idea what could be causing the problem, other than that it started about a week ago. Is there some way to get a stack trace from a running NodeJS instance?
There's no log output during the time the problem occurs. servercooties.com tracks our instance's uptime, so @accalia can probably pinpoint the times when the problem occurred.
/cc @boomzilla @PJH
-
@baris said in Infinite loop:
Are you using nginx to direct traffic to nodebb?
yes
@baris said in Infinite loop:
Are there any errors in the nginx logs?
$ wc -l /var/log/nginx/error.log 180933 /var/log/nginx/error.log
Anything specific I should look for?
-
Look for any socket.io errors if you have alot of those you can change the upstream block for nodebb to the below.
upstream nodebb{ ip_hash; server 127.0.0.1:4567 max_fails=0 fail_timeout=10s; server 127.0.0.1:4568 max_fails=0 fail_timeout=10s; keepalive 512; }
This helps under highload if socket.io is throwing alot of errors .
-
ok, I see that there is a way to get a stack trace from NodeJS, but it requires starting it with
--debug-brk
so a debugger can be attached (and then unpaused and allowed to run as normal until it hangs).Should I edit loader.js for that, or is there some other way to start one of the instances of a cluster with an extra parameter?
-
I would look at anything that is different from a default install, ie custom plugins, core modifications, changes to default settings like increasing unread cutoff time etc.
We are not experiencing the hangs on this site or busy sites that we host so it could be a configuration change that's causing it.
-
Okay, strace showed both instances hanging after this:
connect(85, {sa_family=AF_INET, sin_port=htons("[iframely port]"), sin_addr=inet_addr("[iframely IP]")}, 16) = -1 EINPROGRESS (Operation now in progress)
I've blacklisted the forum's hostname from iframely. We'll see if that fixes it.
-
Ok, time for an update. The iframely problem was definitely one of the causes, but we're still getting occasional lock-ups. Nowhere near as bad as before, but the site still goes down a few times a day. We have a topic where we track manual restarts, but the data doesn't really show any pattern other than that the infinite loop is happening during times when a lot of pages are loaded.