@crazycells nice and succinct, appreciate you helping out on the forums 😁
Upgrade "Refresh post-upload associations" in 1.9.x causes error
I tried to upgrade a forum from 1.8.2 to 1.9.2 and I am receiving an error while doing so:
Updating NodeBB... 1. Updating NodeBB data store schema... Parsing upgrade scripts... OK | 1 script(s) found, 52 skipped → [2018/4/16] Refresh post-upload associations... [ ] (279900/2255187) 12% Error occurred Error occurred during upgrade: MongoError: cursor id 23718515875 not found at /data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:598:61 at authenticateStragglers (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:516:16) at Connection.messageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:552:5) at emitMessageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:309:10) at Socket.<anonymous> (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:452:17) at emitOne (events.js:116:13) at Socket.emit (events.js:211:7) at addChunk (_stream_readable.js:263:12) at readableAddChunk (_stream_readable.js:250:11) at Socket.Readable.push (_stream_readable.js:208:10) /data/node_modules/mongodb/lib/utils.js:132 throw err; ^ MongoError: cursor id 23718515875 not found at /data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:598:61 at authenticateStragglers (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:516:16) at Connection.messageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:552:5) at emitMessageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:309:10) at Socket.<anonymous> (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:452:17) at emitOne (events.js:116:13) at Socket.emit (events.js:211:7) at addChunk (_stream_readable.js:263:12) at readableAddChunk (_stream_readable.js:250:11) at Socket.Readable.push (_stream_readable.js:208:10)
As you can see there are a lot of posts (2,255,187) and this upgrade would probably take 40-50 minutes. It crashes with this error message after about 5 minutes. It will later restart from the beginning and crash at this point again. Do you have any ideas why this could be happening?
Yes... the post upload associations upgrade is a long-running one, but is also optional. For now you can just add
schemaLogsorted set to get you back up and running again.
Ok, that is fine to fix it temporarily. Also it doesn't have a very high priority. This is only a test install the actual one is already on 1.9.2. I just was wondering what was going on here to get a better understanding of NodeBB and especially MongoDB.
If you search for that error on the internet you find a lot of people saying that this is usually a timeout issue. A large query was executed without batching and thus the cursor timed out while being still used. This alone feels really strange to me as this seems like questionable design by MongoDB. But watching at the upgrade code this is actually batched in units of 100. So I don't see why this would cause any problems.
I also checked MongoDB memory status and it is not running out of it. Also NodeBB still has plenty of memory available. Sure the server is under heavy load because of this process but that shouldn't cause any problems? So what would be the most probable cause for this? Do I have a misconfiguration in the MongoDB?
@dravere Can you try to change the upgrade script from
async.eachSerieslet me know if that fixes the issue. You can also try appling this commit as well https://github.com/NodeBB/NodeBB/commit/cba5aa975ea20b509c1766f3166d048103001afe
Not sure why you are getting that error the upgrade script looks OK to me.
Dravere last edited by Dravere
@baris Sure, sadly it didn't fix the problem.
Is much slower, but threw the same error after about 2 minutes at around 4%.
Still slower but faster than previous one. Threw the same error after about 2 minutes at around 5%.
Huge performance improvement. Would probably finish in 25 instead of 50 minutes. But still threw the error after about 3 minutes at 14%
Edit: Will have to move this test environment tomorrow. I'll report back if I still can reproduce the error in the new test environment.
@baris Was able to try it out on my new test environment. This new environment has extremely better hardware specs. But it still crashed with the same error after about 2 minutes at 14%. This was with the default 1.9.2 code.
Could you try adding another option to the script here:
@Dravere Yes, I echo that... especially now that your migration went so smoothly
If you are worried we can provide our pub keys so the server remains under your control... but don't let us near the actual production box.
I can't send you the data. I think I would break several laws if I would do that at this point
Also I don't know how important that work is. I mean my production database is already on 1.9.2 since I migrated directly to that version from the old forum (phpbb2). So this really doesn't affect me at all. I just wanted to report it, perhaps try to understand it and learn something from it. Especially since my knowledge of MongoDB is still quite limited.
This error is happening on an earlier test migration to 1.8.2 that I have still around. So in theory I could just throw it away. Especially if you have no other reports with this problem.
I'll gladly try it with smaller batch sizes. But I kind of doubt that this will make a lot of difference. I have a batch process in my modified solr plugin but with a batch size of 1000 and it is working fine there (Yes, I'll work on the pull requests for solr, I know they are still open ).
@dravere can you make a change for me and run the upgrade script again?
console.log('uploads', pid, uploads);here
The error is caused by a timeout and the only place I can think of halting execution is that while loop. The timeout seems to be 10minutes and if
cursor.next()is not called for 10minutes you get that cursor not found error.
Commenting out the fast path (these three lines) seems to be working so far (although extremely slowly).
My instance would previously get killed at around 15% (there were a lot of posts from two forum softwares ago when pasting files into a forum wasn't a thing), but now it's at 90%. It's been about 4 hours since I started this attempt.
@Dravere ping, as this may be of interest to you
It is not a problem in NodeBB. It seems to be a problem in the mongodb package. Downgrading it to version 3.0.4 and then only executing
./nodebb upgrade -sso that the mongodb package isn't upgraded back to 3.0.8 makes the migration work. And it makes it work really fast. It took like 5 minutes.
Found it via this issue in mongoose: https://github.com/Automattic/mongoose/issues/6504
And the issue going along in JIRA: https://jira.mongodb.org/browse/NODE-1482
If someone has an account in this JIRA tracker, we perhaps should report there, that we are able to reproduce that bug.
I commented on the issue in jira and posted a link to this topic.
Well done, that's some thorough investigating @Dravere !