Is there a simple way to analyze long running queries?

Dravere

In NodeBB 1.10.2 I had many 503er and thus decided to just increase the event loop lag threshold to 200ms and also set the check interval to 1000ms. This made most 503er disappear. After upgrading to NodeBB 1.11.2 now suddenly again there is a 503er every 1-5 seconds.

CPU load is not an issue. Memory is also enough available. But I notice that the NodeBB and NodeJS event loop lag at times spikes. And I would like to find out why that is or which queries created those spikes. Is there a possibility to get this information from NodeBB? Are there suitable hooks that could be used for such monitoring?

<baris>

If you are running mongod, you can try running db.currentOp() to see the running queries during the 503.

Dravere

@baris said in Is there a simple way to analyze long running queries?:

If you are running mongod, you can try running db.currentOp() to see the running queries during the 503.

Oh, sorry. I might have formulated this a bit misleading. I was more thinking of the HTTP request. Basically find out long running NodeBB requests. I'd like to find out what causes the spike. A long running DB query shouldn't cause a spike as this is IO and shouldn't hurt the NodeJS event loop lag? If there is additional lag there has to be a longer running calculation in NodeBB. Or am I wrong with this assumption?

jtsimoes

Same here... And I checked my NodeBB log and nothing appears while 503 Excessive Load Warning error

UPDATE:
Nvm, on Firefox is doing great! I think i just need to clean cache and cookies on Chrome and should be fine

Dravere

I found something new today. First it seems to have gotten quite bad at times. Lag spikes of up to a seconds. The second interesting thing is, that it happens on a schedule. It happens every 25 minutes. Is there anything in NodeBB that is happening every 25 minutes?

<baris>

@Dravere I think notifications are pruned every 30 minutes.

Dravere

@baris Yes, they are. But further analysis seems to reveal, that is happening much more often and is building up in spikes before it drops again. This is the measurements for an hour:

Those spikes building up are about 1.5 - 2 minutes apart. I missed them previously as I wasn't zoomed in enough. The measurement is done every 15 seconds. I'm pretty sure it has something to do with the change to NodeBB 1.11.x since I hadn't such problems in NodeBB 1.10.x. But I still have absolutely no clue what it is and a very hard time to find possibilities to analyse this better.

julian

@Dravere is it tied to load? If you copy the production database to a new server and execute, do you see the same spikes?

If so, then you can start git bisecting and see where things start to go wrong

<baris>

@Dravere How many nodebb procs are running?

Dravere

@baris One nodebb process is running.

@julian So far I haven't noticed it outside of production. But my development machine is quite different. Also I usually don't have Prometheus attached to my development environment so it might be possible I don't even notice it. I'll have to look into setting up a second environment without any load but otherwise similar to the production environment. Might take me some time

<baris>

You can temporarily disable all cron jobs and see if the issue goes away, then we can pinpoint which one causes it. If it is still happening then it's not the cronjobs.

Dravere

@baris said in Is there a simple way to analyze long running queries?:

You can temporarily disable all cron jobs and see if the issue goes away, then we can pinpoint which one causes it. If it is still happening then it's not the cronjobs.

Do you mean in code? Or is there actually a setting to disable them all?

<baris>

@Dravere Yeah I meant in code, would need to comment out the lines that match new cron