Redis memory crashes
Getting these in the log:
14907:M 26 Sep 01:53:11.592 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
26 Sep 01:53:11.592 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
So should I just take the steps it says to fix it?
Tried both vm.overcommit_memory=1 and THP steps that the REDIS log mentioned but it's not helped.
Now, after 50% of crashes, the server won't even let me access it via console when it goes down - hard reset on Digital Ocean is needed, then restart everything from there.
Okay, so I think i've got somewhere. Time will tell if it works, but any errors have been cleared up in my Redis log file when restarting it.
Should anyone encounter the same issues, here's how I fixed the errors in the redis log as posted above. I'm using Ubuntu.
- I've added a 1GB swap file - this is pretty simple to do, and the link is to Digital Ocean. I've gone for a smaller swap file than my memory for now (which is 2GB).
- Fixed the vm.overcommit_memory problem - The error log (see above) gave perfect advice on which file to edit and then a redis reboot, editing etc/sysctl.conf and rebooting
- THP (Transparent Huge Pages) fix:
Entered these commands into the terminal:
echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag
Then edited /etc/rc.local to include the following, which disable THP on reboot:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/transparent_hugepage/defrag fi
Note - I added these before the line which said:
Now, normally there's a crash through the night, so i'll update tomorrow if there's any further problems. But thanks @baris for the advice on a swap file, and hopefully if it all works someone else will find it useful.
Furthermore, one issue I hadn't dealt with was this on server startup:
Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
I increased the limit by running this command:
ulimit -SHn 65535
Then, I edited the /etc/security/limits.conf file to add these lines:
ubuntu soft nofile 65535 ubuntu hard nofile 65535 root soft nofile 65535 root hard nofile 65535
Now when I restart Redis, there's no errors:
Now, time for a well-earned coffee and fingers crossed that's the problem dealt with.
P.S., getting an invalid refresh token message when trying to upload an image.
Ok, so that's not helped. The forum is on the brink of crashing, outputting invalid csrf tokens on /login/
I took a peek at the memory usage, which has sky rocketed.
Memory used seems to increase constantly until it crashes by the looks of things. Restarting redis then nodebb is making no difference to the memory used.
Doing this on my iPhone so managed to get top, but my power button's a pain to screenshot.
As for plugins, I disabled blog comments & Twitter (embed tweets) as they were failing, and pushbullet as I wasn't using it anyway - I'll get a full list when home.
Topping it off, a DNS issue has raised its head so I'll have to see if somethings happened with Apache. I'll jump on it within the hour and see what's what.
Okay, here's top ordered by memory.
I've looked at Apache and there's no changes to the configuration for that - even after a restart (Apache, Redis, NodeBB) the browser's showing nothing, and the server is timing out; perhaps this is just the memory issue?
I'm about to reset all plugins, so i'll post how that goes soon, and give a list of what was used.
On Apache, getting this error:
AH00549: Failed to resolve server name for xx.xx.xx.xx (check DNS) -- or specify an explicit ServerName
As I said, no settings have been changed in regards to that.
Edit - after a hard reset on the Digital Ocean site, the forum reappeared for a second or two, then vanished like a fart in the wind, so Apache/DNS doesn't seem to be the problem. Also, here's the memory usage after that.
All plugins have been deactivated.
I've turned Apache off on the server and installed nginx to use that instead. Intended to do that for some time anyway.
http://www.spflforum.com seems to be working, but i've not yet configured it for the other domain.
One curious side effect is the font size in Persona has increased?
Well, strange font sizes aside, the forum is up and running again, and this looks far healthier. Granted, Wordpress isn't running, but that's a job for tomorrow.
I guess we'll soon see if Nginx experiences the same issues.
Plugin list, I couldn't get earlier - activated a few we use.
Memory use is still increasing incrementally.
And this is where we're at today with it. Free -m and top ordered by mem shown.
(Bear in mind this is using Nginx and not Apache, nor has Nginx yet been configured to point the other domain to the Wordpress install, php5 has also been turned off)