Can you offer any feedback to the post above from our tech person?
Thanks in Advance,
Mark
We have one special thread on our forum that, anytime someone posts to it, the CPU spikes to 100% for maybe a little less than five minutes and the site is completely unavailable. Nothing goes wrong in the logs and after the time is up, everything is normal. We locked the thread and rarely see an issue. If we unlock it, one post to it and the issue comes right back.
Here is the link to the specific thread:
I should note, if we stop the NodeBB process and restart it, it spikes right back up again. We just have to wait out the spike, unfortunately.
I grabbed an strace of the process during the spike in case that helps. It just looks like this over and over again...
gettimeofday({1408323867, 541586}, NULL) = 0
gettimeofday({1408323867, 541638}, NULL) = 0
gettimeofday({1408323867, 541743}, NULL) = 0
futex(0x7f4fc00008c8, FUTEX_WAKE_PRIVATE, 1) = 1
mmap(0x2568ba73f000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2568ba73f000
munmap(0x2568ba73f000, 790528) = 0
munmap(0x2568bb025000, 258048) = 0
mmap(0x2568ba800000, 8538824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2568ba800000
mmap(0x3600d0bc9000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x3600d0bc9000
munmap(0x3600d0bc9000, 225280) = 0
munmap(0x3600d1425000, 823296) = 0
mmap(0x3600d0c00000, 8538792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3600d0c00000
mmap(0x14316f988000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x14316f988000
munmap(0x14316f988000, 491520) = 0
munmap(0x143170225000, 557056) = 0
mmap(0x14316fa00000, 8538752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x14316fa00000
mmap(0x4aff0cb4000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x4aff0cb4000
munmap(0x4aff0cb4000, 311296) = 0
munmap(0x4aff1525000, 737280) = 0
mmap(0x4aff0d00000, 8538720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4aff0d00000
mmap(0x29f36d128000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x29f36d128000
munmap(0x29f36d128000, 884736) = 0
munmap(0x29f36da25000, 163840) = 0
mmap(0x29f36d200000, 8538688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x29f36d200000
gettimeofday({1408323867, 562398}, NULL) = 0
gettimeofday({1408323867, 590361}, NULL) = 0
gettimeofday({1408323867, 591337}, NULL) = 0
gettimeofday({1408323867, 592171}, NULL) = 0
gettimeofday({1408323867, 592268}, NULL) = 0
gettimeofday({1408323867, 592316}, NULL) = 0
gettimeofday({1408323867, 592357}, NULL) = 0
gettimeofday({1408323867, 592399}, NULL) = 0
gettimeofday({1408323867, 592440}, NULL) = 0
gettimeofday({1408323867, 592481}, NULL) = 0
gettimeofday({1408323867, 592577}, NULL) = 0
gettimeofday({1408323867, 592621}, NULL) = 0
gettimeofday({1408323867, 592689}, NULL) = 0
gettimeofday({1408323867, 592732}, NULL) = 0
gettimeofday({1408323867, 592773}, NULL) = 0
gettimeofday({1408323867, 592870}, NULL) = 0
gettimeofday({1408323867, 592916}, NULL) = 0
gettimeofday({1408323867, 592956}, NULL) = 0
gettimeofday({1408323867, 593601}, NULL) = 0
munmap(0x4aff0d00000, 8540160) = 0
munmap(0x14316fa00000, 8540160) = 0
munmap(0x3600d0c00000, 8540160) = 0
munmap(0x2568ba800000, 8540160) = 0
munmap(0x21beb2f00000, 8540160) = 0
gettimeofday({1408323867, 594822}, NULL) = 0
gettimeofday({1408323867, 594890}, NULL) = 0
gettimeofday({1408323867, 594986}, NULL) = 0
futex(0x7f4fc00008c8, FUTEX_WAKE_PRIVATE, 1) = 1
mmap(0x2e5e2dba1000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2e5e2dba1000
munmap(0x2e5e2dba1000, 389120) = 0
munmap(0x2e5e2e425000, 659456) = 0
mmap(0x2e5e2dc00000, 8538640, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2e5e2dc00000
mmap(0xc7259f7c000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xc7259f7c000
munmap(0xc7259f7c000, 540672) = 0
munmap(0xc725a825000, 507904) = 0
mmap(0xc725a000000, 8538680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc725a000000
mmap(0x3cfa03435000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x3cfa03435000
munmap(0x3cfa03435000, 831488) = 0
munmap(0x3cfa03d25000, 217088) = 0
mmap(0x3cfa03500000, 8538664, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3cfa03500000
mmap(0x1b58bac54000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x1b58bac54000
munmap(0x1b58bac54000, 704512) = 0
munmap(0x1b58bb525000, 344064) = 0
mmap(0x1b58bad00000, 8538640, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1b58bad00000
mmap(0x5870c1ca000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x5870c1ca000
munmap(0x5870c1ca000, 221184) = 0
munmap(0x5870ca25000, 827392) = 0
mmap(0x5870c200000, 8538624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x5870c200000
gettimeofday({1408323867, 616858}, NULL) = 0
gettimeofday({1408323867, 644188}, NULL) = 0
gettimeofday({1408323867, 645073}, NULL) = 0
gettimeofday({1408323867, 645907}, NULL) = 0
gettimeofday({1408323867, 645990}, NULL) = 0
gettimeofday({1408323867, 646034}, NULL) = 0
gettimeofday({1408323867, 646075}, NULL) = 0
gettimeofday({1408323867, 646114}, NULL) = 0
gettimeofday({1408323867, 646156}, NULL) = 0
gettimeofday({1408323867, 646196}, NULL) = 0
gettimeofday({1408323867, 646291}, NULL) = 0
gettimeofday({1408323867, 646334}, NULL) = 0
gettimeofday({1408323867, 646411}, NULL) = 0
gettimeofday({1408323867, 646455}, NULL) = 0
gettimeofday({1408323867, 646498}, NULL) = 0
gettimeofday({1408323867, 646538}, NULL) = 0
gettimeofday({1408323867, 646579}, NULL) = 0
gettimeofday({1408323867, 646619}, NULL) = 0
gettimeofday({1408323867, 647215}, NULL) = 0
munmap(0x1b58bad00000, 8540160) = 0
munmap(0x3cfa03500000, 8540160) = 0
munmap(0xc725a000000, 8540160) = 0
munmap(0x2e5e2dc00000, 8540160) = 0
munmap(0x29f36d200000, 8540160) = 0
gettimeofday({1408323867, 648436}, NULL) = 0
gettimeofday({1408323867, 648500}, NULL) = 0
gettimeofday({1408323867, 648596}, NULL) = 0
futex(0x7f4fc00008c8, FUTEX_WAKE_PRIVATE, 1) = 1
mmap(0x1e2678b8b000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x1e2678b8b000
munmap(0x1e2678b8b000, 479232) = 0
munmap(0x1e2679425000, 569344) = 0
mmap(0x1e2678c00000, 8538600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1e2678c00000
mmap(0x266b76515000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x266b76515000
munmap(0x266b76515000, 962560) = 0
munmap(0x266b76e25000, 86016) = 0
mmap(0x266b76600000, 8538584, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x266b76600000
mmap(0x22128e50c000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x22128e50c000
munmap(0x22128e50c000, 999424) = 0
munmap(0x22128ee25000, 49152) = 0
mmap(0x22128e600000, 8538560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x22128e600000
mmap(0x24ec62b8b000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x24ec62b8b000
munmap(0x24ec62b8b000, 479232) = 0
munmap(0x24ec63425000, 569344) = 0
mmap(0x24ec62c00000, 8538544, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x24ec62c00000
mmap(0x7fdf4db2000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fdf4db2000
munmap(0x7fdf4db2000, 319488) = 0
munmap(0x7fdf5625000, 729088) = 0
mmap(0x7fdf4e00000, 8538520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdf4e00000
gettimeofday({1408323867, 669966}, NULL) = 0
gettimeofday({1408323867, 698507}, NULL) = 0
gettimeofday({1408323867, 699587}, NULL) = 0
gettimeofday({1408323867, 700691}, NULL) = 0
gettimeofday({1408323867, 700787}, NULL) = 0
gettimeofday({1408323867, 700833}, NULL) = 0
gettimeofday({1408323867, 700873}, NULL) = 0
gettimeofday({1408323867, 700916}, NULL) = 0
gettimeofday({1408323867, 700962}, NULL) = 0
gettimeofday({1408323867, 701007}, NULL) = 0
gettimeofday({1408323867, 701165}, NULL) = 0
gettimeofday({1408323867, 701208}, NULL) = 0
gettimeofday({1408323867, 701269}, NULL) = 0
gettimeofday({1408323867, 701310}, NULL) = 0
gettimeofday({1408323867, 701350}, NULL) = 0
gettimeofday({1408323867, 701389}, NULL) = 0
gettimeofday({1408323867, 701430}, NULL) = 0
gettimeofday({1408323867, 701470}, NULL) = 0
gettimeofday({1408323867, 702268}, NULL) = 0
munmap(0x24ec62c00000, 8540160) = 0
munmap(0x22128e600000, 8540160) = 0
munmap(0x266b76600000, 8540160) = 0
munmap(0x1e2678c00000, 8540160) = 0
munmap(0x5870c200000, 8540160) = 0
gettimeofday({1408323867, 703120}, NULL) = 0
gettimeofday({1408323867, 703192}, NULL) = 0
gettimeofday({1408323867, 703318}, NULL) = 0
futex(0x7f4fc00008c8, FUTEX_WAKE_PRIVATE, 1) = 1
mmap(0xd83d282d000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0xd83d282d000
munmap(0xd83d282d000, 864256) = 0
munmap(0xd83d3125000, 184320) = 0
mmap(0xd83d2900000, 8538504, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xd83d2900000
mmap(0x10ebf46cb000, 9588736, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x10ebf46cb000
munmap(0x10ebf46cb000, 217088) = 0
munmap(0x10ebf4f25000, 831488) = 0
mmap(0x10ebf4700000, 8538480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x10ebf4700000
Weird, does it happen in this thread as well? http://mangolassi.it/topic/1022/what-are-you-doing-right-now
You can also try these http://docs.mongodb.org/manual/administration/monitoring/ to see what the database is doing during the spike.
I see you are on 0.5.0-2 now is it still happening on that version?
Yes on 0.5.0-2
No, that other thread is fine. I've been watching it carefully assuming that it would freak out too but it doesn't. There is something unique about that one thread.
I will wait until we have a lull (site is active around the clock) and see what the DB is doing.
I suspect one of the posts contains something causing a problem. But who knows what
We have noticed that even reading the thread causing the CPU to spike for a few minutes.
@scottalanmiller Strange, I've got a topic with just under 600 posts on my forum, and it loads and I can post to it without issue (I do use Pagination though). Does it happen when you view a certain portion of posts? Maybe one of the posts is causing the issue?
I am seeing this in one of the posts and a matching error in the browser console.
<p><a href="uploading 10%" target="_blank"><img src="uploading 10%" alt="Awesome recovery.gif" class="img-responsive"></a> </p>
GET http://mangolassi.it/topic/389/random-thread-anything-goes/uploading%2010% 504 (Gateway Time-out) nodebb.min.js?eb0c6f65:3
Maybe thats related.
It causes this on the server side
TypeError: Failed to decode param 'uploading%2010%'
at decode_param (/home/baris/node-forum/node_modules/express/lib/router/layer.js:147:15)
at Layer.match (/home/baris/node-forum/node_modules/express/lib/router/layer.js:119:11)
at Function.match_layer (/home/baris/node-forum/node_modules/express/lib/router/index.js:281:16)
at next (/home/baris/node-forum/node_modules/express/lib/router/index.js:182:10)
at /home/baris/node-forum/node_modules/express/lib/router/index.js:184:16
at Function.match_layer (/home/baris/node-forum/node_modules/express/lib/router/index.js:288:3)
at next (/home/baris/node-forum/node_modules/express/lib/router/index.js:182:10)
at /home/baris/node-forum/node_modules/express/lib/router/index.js:184:16
at Function.match_layer (/home/baris/node-forum/node_modules/express/lib/router/index.js:288:3)
at next (/home/baris/node-forum/node_modules/express/lib/router/index.js:182:10)
@a_5mith said:
@scottalanmiller Strange, I've got a topic with just under 600 posts on my forum, and it loads and I can post to it without issue (I do use Pagination though). Does it happen when you view a certain portion of posts? Maybe one of the posts is causing the issue?
Yes, I think that it is a single post, we've got others nearly as long. It didn't deteriorate, it failed suddenly.
@baris said:
I am seeing this in one of the posts and a matching error in the browser console.
So maybe we have something being treated as non-text causing an issue? That would make sense.
@baris said:
Awesome recovery.gif
So how do I edit that entry so that it doesn't crash? Deleting it won't make it completely go aware. But I can't delete or edit anyway because just by going to the page makes the server hang. Hmm...
Hmm once you are on that page you can inspect that element to get its post id. Once you have that you can update it in the database itself it will have a _key = post:<post id>.
db.objects.find({_key:"post:<post_id>");
Then something like
db.objects.update({_key:"post:<post_id>"}, {$set:{"content": "testing edit"}});
Let me know if that works.
This issue has returned for us big time. We don't know what thread is doing it now but it is happening about once an hour. It is getting really bad all of a sudden - of course coinciding with a major increase of traffic over the last two days. We are getting a little desperate now.
@baris said:
Any updates on this?
I am seeing a similar issue on a pretty busy mongo instance, when it happens the nodebb nodejs process hits 100% cpu but the mongo process is idle. Usually it takes a restart to fix it.
that's exactly what we see. We now just hop in and do an immediate "kill -9" on the offending process and restart right away. It is horrible.
Oh sure, it happened while I was posting this! That's twice in ten minutes now!
I'll wait for @baris reply but since he replied to you a month ago he's done a ton of improvements to the mongodb driver (our biggest client is actually using mongo right now) so hang tight, hopefully we can help. Are you on the latest code at the moment (0.5.1)?
No, we are on 0.5.0-4. When should we update?