Upgrade "Refresh post-upload associations" in 1.9.x causes error


  • I tried to upgrade a forum from 1.8.2 to 1.9.2 and I am receiving an error while doing so:

    Updating NodeBB...
    
    1. Updating NodeBB data store schema...
    Parsing upgrade scripts... 
    OK | 1 script(s) found, 52 skipped
      → [2018/4/16] Refresh post-upload associations...
        [             ] (279900/2255187) 12% Error occurred
    Error occurred during upgrade: MongoError: cursor id 23718515875 not found
        at /data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:598:61
        at authenticateStragglers (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:516:16)
        at Connection.messageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:552:5)
        at emitMessageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:309:10)
        at Socket.<anonymous> (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:452:17)
        at emitOne (events.js:116:13)
        at Socket.emit (events.js:211:7)
        at addChunk (_stream_readable.js:263:12)
        at readableAddChunk (_stream_readable.js:250:11)
        at Socket.Readable.push (_stream_readable.js:208:10)
    
    /data/node_modules/mongodb/lib/utils.js:132
          throw err;
          ^
    MongoError: cursor id 23718515875 not found
        at /data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:598:61
        at authenticateStragglers (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:516:16)
        at Connection.messageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:552:5)
        at emitMessageHandler (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:309:10)
        at Socket.<anonymous> (/data/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:452:17)
        at emitOne (events.js:116:13)
        at Socket.emit (events.js:211:7)
        at addChunk (_stream_readable.js:263:12)
        at readableAddChunk (_stream_readable.js:250:11)
        at Socket.Readable.push (_stream_readable.js:208:10)
    

    As you can see there are a lot of posts (2,255,187) and this upgrade would probably take 40-50 minutes. It crashes with this error message after about 5 minutes. It will later restart from the beginning and crash at this point again. Do you have any ideas why this could be happening?

  • GNU/Linux Admin

    Yes... the post upload associations upgrade is a long-running one, but is also optional. For now you can just add refresh_post_upload_associations to the schemaLog sorted set to get you back up and running again.


  • Ok, that is fine to fix it temporarily. Also it doesn't have a very high priority. This is only a test install the actual one is already on 1.9.2. I just was wondering what was going on here to get a better understanding of NodeBB and especially MongoDB.

    If you search for that error on the internet you find a lot of people saying that this is usually a timeout issue. A large query was executed without batching and thus the cursor timed out while being still used. This alone feels really strange to me as this seems like questionable design by MongoDB. But watching at the upgrade code this is actually batched in units of 100. So I don't see why this would cause any problems.

    I also checked MongoDB memory status and it is not running out of it. Also NodeBB still has plenty of memory available. Sure the server is under heavy load because of this process but that shouldn't cause any problems? So what would be the most probable cause for this? Do I have a misconfiguration in the MongoDB?

  • Admin NodeBB

    @dravere Can you try to change the upgrade script from async.each to use async.eachSeries let me know if that fixes the issue. You can also try appling this commit as well https://github.com/NodeBB/NodeBB/commit/cba5aa975ea20b509c1766f3166d048103001afe

    Not sure why you are getting that error the upgrade script looks OK to me.


  • @baris Sure, sadly it didn't fix the problem.

    Change Results
    async.eachSeries Is much slower, but threw the same error after about 2 minutes at around 4%.
    async.eachSeries & patch Still slower but faster than previous one. Threw the same error after about 2 minutes at around 5%.
    async.each & patch Huge performance improvement. Would probably finish in 25 instead of 50 minutes. But still threw the error after about 3 minutes at 14%

    Edit: Will have to move this test environment tomorrow. I'll report back if I still can reproduce the error in the new test environment.


  • @baris Was able to try it out on my new test environment. This new environment has extremely better hardware specs. But it still crashed with the same error after about 2 minutes at 14%. This was with the default 1.9.2 code.

  • GNU/Linux Admin

    Could you try adding another option to the script here: batch: 50?

    25?

  • Admin NodeBB

    @dravere If you don't mind you can send us a copy of your db as well so we can test it, maybe it has something to do with the data. You can send it to support@nodebb.org.

  • GNU/Linux Admin

    @Dravere Yes, I echo that... especially now that your migration went so smoothly 😉

    If you are worried we can provide our pub keys so the server remains under your control... but don't let us near the actual production box. 😛


  • I can't send you the data. I think I would break several laws if I would do that at this point 😉

    Also I don't know how important that work is. I mean my production database is already on 1.9.2 since I migrated directly to that version from the old forum (phpbb2). So this really doesn't affect me at all. I just wanted to report it, perhaps try to understand it and learn something from it. Especially since my knowledge of MongoDB is still quite limited.

    This error is happening on an earlier test migration to 1.8.2 that I have still around. So in theory I could just throw it away. Especially if you have no other reports with this problem.

    I'll gladly try it with smaller batch sizes. But I kind of doubt that this will make a lot of difference. I have a batch process in my modified solr plugin but with a batch size of 1000 and it is working fine there (Yes, I'll work on the pull requests for solr, I know they are still open 😅).

  • Admin NodeBB

    @dravere can you make a change for me and run the upgrade script again?

    Add console.log('uploads', pid, uploads); here

    The error is caused by a timeout and the only place I can think of halting execution is that while loop. The timeout seems to be 10minutes and if cursor.next() is not called for 10minutes you get that cursor not found error.

  • Gamers

    Commenting out the fast path (these three lines) seems to be working so far (although extremely slowly).

    My instance would previously get killed at around 15% (there were a lot of posts from two forum softwares ago when pasting files into a forum wasn't a thing), but now it's at 90%. It's been about 4 hours since I started this attempt.

  • GNU/Linux Admin

    @Dravere ping, as this may be of interest to you


  • @baris @Ben-Lubar @julian Found the problem. The solution Ben describes works, but it is painfully slow. And also made no sense. So I continued searching and found it.

    It is not a problem in NodeBB. It seems to be a problem in the mongodb package. Downgrading it to version 3.0.4 and then only executing ./nodebb upgrade -s so that the mongodb package isn't upgraded back to 3.0.8 makes the migration work. And it makes it work really fast. It took like 5 minutes.

    Found it via this issue in mongoose: https://github.com/Automattic/mongoose/issues/6504
    And the issue going along in JIRA: https://jira.mongodb.org/browse/NODE-1482

    If someone has an account in this JIRA tracker, we perhaps should report there, that we are able to reproduce that bug.

  • Admin NodeBB

    I commented on the issue in jira and posted a link to this topic.

  • GNU/Linux Admin

    👏 Well done, that's some thorough investigating @Dravere !

Suggested Topics

| |