Redis memory crashes



  • I seem to have a memory crash on Redis at least once a day.

    I've little experience of Redis - none until NodeBB in fact - are there any good resources online for helping to troubleshoot problems with it?


  • Admin

    How much memory does your system have? Check the output of free -m and make sure you have enough memory/swap enabled.



  • When I restarted Redis server this morning there was 1050 free. A few minutes later, 1039.
    Swap is 0, though - I'm guessing this is the problem?

    image.jpeg image.jpeg


  • Admin

    Even though you seem to have a lot of free memory it's a good idea to have some swap just in case.

    Also check your redis logs and see if there is anything useful there.



  • Getting these in the log:

    14907:M 26 Sep 01:53:11.592 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
    14907:M
    26 Sep 01:53:11.592 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

    So should I just take the steps it says to fix it?



  • Tried both vm.overcommit_memory=1 and THP steps that the REDIS log mentioned but it's not helped.

    Now, after 50% of crashes, the server won't even let me access it via console when it goes down - hard reset on Digital Ocean is needed, then restart everything from there.



  • Okay, so I think i've got somewhere. Time will tell if it works, but any errors have been cleared up in my Redis log file when restarting it.

    Should anyone encounter the same issues, here's how I fixed the errors in the redis log as posted above. I'm using Ubuntu.

    • I've added a 1GB swap file - this is pretty simple to do, and the link is to Digital Ocean. I've gone for a smaller swap file than my memory for now (which is 2GB).
    • Fixed the vm.overcommit_memory problem - The error log (see above) gave perfect advice on which file to edit and then a redis reboot, editing etc/sysctl.conf and rebooting
    • THP (Transparent Huge Pages) fix:

    Entered these commands into the terminal:

    echo never > /sys/kernel/mm/transparent_hugepage/enabled 
    echo never > /sys/kernel/mm/transparent_hugepage/defrag
    

    Then edited /etc/rc.local to include the following, which disable THP on reboot:

     if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
       echo never > /sys/kernel/mm/transparent_hugepage/enabled
          fi
     if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
     echo never > /sys/kernel/mm/transparent_hugepage/defrag
         fi
    

    Note - I added these before the line which said:

    exit 0 
    

    Now, normally there's a crash through the night, so i'll update tomorrow if there's any further problems. But thanks @baris for the advice on a swap file, and hopefully if it all works someone else will find it useful.



  • Furthermore, one issue I hadn't dealt with was this on server startup:

    Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
    

    I increased the limit by running this command:

     ulimit -SHn 65535 
    

    Then, I edited the /etc/security/limits.conf file to add these lines:

    ubuntu soft nofile 65535
    ubuntu hard nofile 65535
    root soft nofile 65535
    root hard nofile 65535
    

    Now when I restart Redis, there's no errors:

    Now, time for a well-earned coffee and fingers crossed that's the problem dealt with.

    P.S., getting an invalid refresh token message when trying to upload an image.



  • Ok, so that's not helped. The forum is on the brink of crashing, outputting invalid csrf tokens on /login/

    I took a peek at the memory usage, which has sky rocketed.

    Memory used seems to increase constantly until it crashes by the looks of things. Restarting redis then nodebb is making no difference to the memory used.

    Anyone help?

    spfll.JPG


  • Admin

    Can you post top as well?

    What plugins do you have installed/active? It could be a memory leak.


  • Admin

    A memory leak would be cleared up when the NodeBB process is killed.

    Inspect top, sort of memory, and see what process is gobbling up all that memory? Hopefully it's not NodeBB 😄



  • Doing this on my iPhone so managed to get top, but my power button's a pain to screenshot.
    image.png

    As for plugins, I disabled blog comments & Twitter (embed tweets) as they were failing, and pushbullet as I wasn't using it anyway - I'll get a full list when home.

    Topping it off, a DNS issue has raised its head so I'll have to see if somethings happened with Apache. I'll jump on it within the hour and see what's what.



  • Okay, here's top ordered by memory.

    top-mem.jpg

    I've looked at Apache and there's no changes to the configuration for that - even after a restart (Apache, Redis, NodeBB) the browser's showing nothing, and the server is timing out; perhaps this is just the memory issue?

    I'm about to reset all plugins, so i'll post how that goes soon, and give a list of what was used.



  • On Apache, getting this error:

    AH00549: Failed to resolve server name for xx.xx.xx.xx (check DNS) -- or specify an explicit ServerName

    As I said, no settings have been changed in regards to that.

    Edit - after a hard reset on the Digital Ocean site, the forum reappeared for a second or two, then vanished like a fart in the wind, so Apache/DNS doesn't seem to be the problem. Also, here's the memory usage after that.

    mem.jpg

    All plugins have been deactivated.


  • Admin

    My first reaction, why are there so many apache processes?



  • There's a Wordpress install running on the server too. It's routed to onlyanexcuse.com - while the forum (nodebb) is directed to spflforum.com. It's never caused any issues before. Still, I'm assuming that there's far too many regardless.



  • I've turned Apache off on the server and installed nginx to use that instead. Intended to do that for some time anyway.

    http://www.spflforum.com seems to be working, but i've not yet configured it for the other domain.

    One curious side effect is the font size in Persona has increased?



  • Well, strange font sizes aside, the forum is up and running again, and this looks far healthier. Granted, Wordpress isn't running, but that's a job for tomorrow.

    I guess we'll soon see if Nginx experiences the same issues.

    hhhh.jpg

    Plugin list, I couldn't get earlier - activated a few we use.

    nodebb-plugin-blog-comments
    nodebb-plugin-composer-redactor
    nodebb-plugin-dbsearch
    nodebb-plugin-emailer-mandrill
    nodebb-plugin-google-analytics
    nodebb-plugin-recent-cards
    nodebb-plugin-spam-be-gone
    nodebb-plugin-sso-facebook
    nodebb-plugin-sso-twitter
    nodebb-widget-essentials

    However...

    Memory use is still increasing incrementally.

    memrise.jpg



  • And this is where we're at today with it. Free -m and top ordered by mem shown.

    today.png

    (Bear in mind this is using Nginx and not Apache, nor has Nginx yet been configured to point the other domain to the Wordpress install, php5 has also been turned off)


Log in to reply
 


Looks like your connection to NodeBB was lost, please wait while we try to reconnect.