Infinite loop


  • Gamers

    On what.thedailywtf.com, we've been having trouble with the NodeJS instances stopping handling requests and running at near 100% CPU usage until they are restarted. We run MongoDB for the main database, Redis for clustering support, and 2 NodeBB clustered instances (down from 4 due to this problem).

    At this point, I have no idea what could be causing the problem, other than that it started about a week ago. Is there some way to get a stack trace from a running NodeJS instance?

    There's no log output during the time the problem occurs. servercooties.com tracks our instance's uptime, so @accalia can probably pinpoint the times when the problem occurred.

    /cc @boomzilla @PJH


  • Admin

    Are you using nginx to direct traffic to nodebb? Are there any errors in the nginx logs?


  • Gamers

    @baris said in Infinite loop:

    Are you using nginx to direct traffic to nodebb?

    yes

    @baris said in Infinite loop:

    Are there any errors in the nginx logs?

    $ wc -l /var/log/nginx/error.log
    180933 /var/log/nginx/error.log
    

    Anything specific I should look for?


  • Admin

    Look for any socket.io errors if you have alot of those you can change the upstream block for nodebb to the below.

    upstream nodebb{
        ip_hash;
        server 127.0.0.1:4567 max_fails=0 fail_timeout=10s; 
        server 127.0.0.1:4568 max_fails=0 fail_timeout=10s; 
        keepalive 512;
    }
    

    This helps under highload if socket.io is throwing alot of errors .


  • Gamers

    ok, I see that there is a way to get a stack trace from NodeJS, but it requires starting it with --debug-brk so a debugger can be attached (and then unpaused and allowed to run as normal until it hangs).

    Should I edit loader.js for that, or is there some other way to start one of the instances of a cluster with an extra parameter?


  • Admin

    @Ben-Lubar Your best bet would be to edit loader.js, I remember trying to pause a running instance to get a stack trace long ago but didn't succeed. Let me know if you have better luck.


  • Gamers

    @baris I was able to pause it and get a backtrace up until it actually encountered the hang, and then the debugger never finished whatever command I threw at it.

    I'm stumped.


  • Admin

    I would look at anything that is different from a default install, ie custom plugins, core modifications, changes to default settings like increasing unread cutoff time etc.

    We are not experiencing the hangs on this site or busy sites that we host so it could be a configuration change that's causing it.


  • Gamers

    Okay, strace showed both instances hanging after this:

    connect(85, {sa_family=AF_INET, sin_port=htons("[iframely port]"), sin_addr=inet_addr("[iframely IP]")}, 16) = -1 EINPROGRESS (Operation now in progress)
    

    I've blacklisted the forum's hostname from iframely. We'll see if that fixes it.


  • Gamers

    Ok, time for an update. The iframely problem was definitely one of the causes, but we're still getting occasional lock-ups. Nowhere near as bad as before, but the site still goes down a few times a day. We have a topic where we track manual restarts, but the data doesn't really show any pattern other than that the infinite loop is happening during times when a lot of pages are loaded.


Log in to reply
 

Looks like your connection to NodeBB was lost, please wait while we try to reconnect.