Redis memory crashes

Danny McWilliams

Tried both vm.overcommit_memory=1 and THP steps that the REDIS log mentioned but it's not helped.

Now, after 50% of crashes, the server won't even let me access it via console when it goes down - hard reset on Digital Ocean is needed, then restart everything from there.

Danny McWilliams

Okay, so I think i've got somewhere. Time will tell if it works, but any errors have been cleared up in my Redis log file when restarting it.

Should anyone encounter the same issues, here's how I fixed the errors in the redis log as posted above. I'm using Ubuntu.

I've added a 1GB swap file - this is pretty simple to do, and the link is to Digital Ocean. I've gone for a smaller swap file than my memory for now (which is 2GB).
Fixed the vm.overcommit_memory problem - The error log (see above) gave perfect advice on which file to edit and then a redis reboot, editing etc/sysctl.conf and rebooting
THP (Transparent Huge Pages) fix:

Entered these commands into the terminal:

echo never > /sys/kernel/mm/transparent_hugepage/enabled 
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Then edited /etc/rc.local to include the following, which disable THP on reboot:

 if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
      fi
 if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
 echo never > /sys/kernel/mm/transparent_hugepage/defrag
     fi

Note - I added these before the line which said:

exit 0

Now, normally there's a crash through the night, so i'll update tomorrow if there's any further problems. But thanks @baris for the advice on a swap file, and hopefully if it all works someone else will find it useful.

Danny McWilliams

Furthermore, one issue I hadn't dealt with was this on server startup:

Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.

I increased the limit by running this command:

 ulimit -SHn 65535

Then, I edited the /etc/security/limits.conf file to add these lines:

ubuntu soft nofile 65535
ubuntu hard nofile 65535
root soft nofile 65535
root hard nofile 65535

Now when I restart Redis, there's no errors:

Now, time for a well-earned coffee and fingers crossed that's the problem dealt with.

P.S., getting an invalid refresh token message when trying to upload an image.

Danny McWilliams

Ok, so that's not helped. The forum is on the brink of crashing, outputting invalid csrf tokens on /login/

I took a peek at the memory usage, which has sky rocketed.

Memory used seems to increase constantly until it crashes by the looks of things. Restarting redis then nodebb is making no difference to the memory used.

Anyone help?

<baris>

Can you post top as well?

What plugins do you have installed/active? It could be a memory leak.

julian

A memory leak would be cleared up when the NodeBB process is killed.

Inspect top, sort of memory, and see what process is gobbling up all that memory? Hopefully it's not NodeBB

Danny McWilliams

Doing this on my iPhone so managed to get top, but my power button's a pain to screenshot.

As for plugins, I disabled blog comments & Twitter (embed tweets) as they were failing, and pushbullet as I wasn't using it anyway - I'll get a full list when home.

Topping it off, a DNS issue has raised its head so I'll have to see if somethings happened with Apache. I'll jump on it within the hour and see what's what.

Danny McWilliams

Okay, here's top ordered by memory.

I've looked at Apache and there's no changes to the configuration for that - even after a restart (Apache, Redis, NodeBB) the browser's showing nothing, and the server is timing out; perhaps this is just the memory issue?

I'm about to reset all plugins, so i'll post how that goes soon, and give a list of what was used.

Danny McWilliams

On Apache, getting this error:

AH00549: Failed to resolve server name for xx.xx.xx.xx (check DNS) -- or specify an explicit ServerName

As I said, no settings have been changed in regards to that.

Edit - after a hard reset on the Digital Ocean site, the forum reappeared for a second or two, then vanished like a fart in the wind, so Apache/DNS doesn't seem to be the problem. Also, here's the memory usage after that.

All plugins have been deactivated.

<baris>

My first reaction, why are there so many apache processes?

Danny McWilliams

There's a Wordpress install running on the server too. It's routed to onlyanexcuse.com - while the forum (nodebb) is directed to spflforum.com. It's never caused any issues before. Still, I'm assuming that there's far too many regardless.

Danny McWilliams

I've turned Apache off on the server and installed nginx to use that instead. Intended to do that for some time anyway.

http://www.spflforum.com seems to be working, but i've not yet configured it for the other domain.

One curious side effect is the font size in Persona has increased?

Danny McWilliams

Well, strange font sizes aside, the forum is up and running again, and this looks far healthier. Granted, Wordpress isn't running, but that's a job for tomorrow.

I guess we'll soon see if Nginx experiences the same issues.

Plugin list, I couldn't get earlier - activated a few we use.

nodebb-plugin-blog-comments
nodebb-plugin-composer-redactor
nodebb-plugin-dbsearch
nodebb-plugin-emailer-mandrill
nodebb-plugin-google-analytics
nodebb-plugin-recent-cards
nodebb-plugin-spam-be-gone
nodebb-plugin-sso-facebook
nodebb-plugin-sso-twitter
nodebb-widget-essentials

However...

Memory use is still increasing incrementally.

Danny McWilliams

And this is where we're at today with it. Free -m and top ordered by mem shown.

(Bear in mind this is using Nginx and not Apache, nor has Nginx yet been configured to point the other domain to the Wordpress install, php5 has also been turned off)