Backing Up Mongodb Backed NodeBB


  • Community Rep

    Just noticed this bit from NodeBB's upgrading docs:

    Backing up MongoDB
    
    To run a backup of your complete MongoDB you can simply run
    
        mongodump
    
    which will create a directory structure that can be restored with the mongorestore command.
    
    It is recommended that you first shut down your database.
    

    Iirc (and I've admittedly not read up on mongodump recently), the primary advantage of mongodump is live backups that do not require shutting down mongod itself. Otherwise a quick tar -czf /bleh/blah/mongo datadir would get the job done, no?

    Onto my questions:

    1. Is shutting down mongod prior to mongodump still the case or might the docs quoted above be dated and referencing older mongodb versions where things might have been a bit more sketchy?

    2. Something else I am missing here? Clue bats welcome.

    TIA- ✌

    Edit: Tweaked post title to more accurately reflect intent. Hope that doesn't blow the SEO optimizations. 😰


  • Global Moderator

    IIRC live backups are only possible under certain conditions, which are not the case under a default install.

    Mongodump is easier than moving the entire mongo directory, and allows for moving across possibly different versions of mongod as well.


  • Community Rep

    @PitaJ Thx.

    It subsequently occurred to me that maybe the shutdown recommendation had to do with "flushing" session tables. Let us assume that Redis is handling sessions. If so, is shutting down Mongo prior to dumping still necessary? Kind of a drag for those of us who run our dumps from cron. Well, I guess could also run the shutdown whilst at it but you know those pesky users - cannot stand for 30s of downtime and no matter when you run your dumps rest assured somebody will notice and whinge about it. 😜


  • Community Rep

    @PitaJ said in Updating/grading Mongodb Backed NodeBB:

    Mongodump is easier than moving the entire mongo directory, and allows for moving across possibly different versions of mongod as well.

    Definitely way easier and faster.


  • Community Rep

    @gotwf said in Updating/grading Mongodb Backed NodeBB:

    @PitaJ Thx.

    It subsequently occurred to me that maybe the shutdown recommendation had to do with "flushing" session tables. Let us assume that Redis is handling sessions. If so, is shutting down Mongo prior to dumping still necessary? Kind of a drag for those of us who run our dumps from cron. Well, I guess could also run the shutdown whilst at it but you know those pesky users - cannot stand for 30s of downtime and no matter when you run your dumps rest assured somebody will notice and whinge about it. 😜

    I've never shut down to use it. If it isn't 100% stable, it sure seems stable.


  • Community Rep

    Both processes should work, but the official MongoDump feels universally better. And if you can shut down to do it, even better, but I'd not be too concerned, especially in a migration where you can do it again, if necessary.


  • Community Rep

    @scottalanmiller Thanks, Scott. I reviewed the mongo docs yesterday and was unable to find any recommendations pro/con shutting down. In your estimation, are cron'd mongodumps "safe" or should "best practice" use a shutdown, dump, restart sequence? Kind of pointless to have regular dumps from cron if they're sketchy. My testing was not exhaustive but looked good, no glitches or issues so I cron'd them and never looked back.


  • Community Rep

    @gotwf said in Updating/grading Mongodb Backed NodeBB:

    @scottalanmiller Thanks, Scott. I reviewed the mongo docs yesterday and was unable to find any recommendations pro/con shutting down. In your estimation, are cron'd mongodumps "safe" or should "best practice" use a shutdown, dump, restart sequence? Kind of pointless to have regular dumps from cron if they're sketchy. My testing was not exhaustive but looked good, no glitches or issues so I cron'd them and never looked back.

    Much of the point of it is not needing to shut down. I would feel confident using a cron job to schedule it.


  • Community Rep

    @scottalanmiller

    Cool. That was my understanding as well but I do not have your mongodb expertise so thank you much for the confirmation. πŸ‘

    Edit: To clarify, my question was specific to regular dumps rather than backing up in preparation for an update/upgrade. If you're tackling the latter then it definitely makes sense to shutdown the stack first since you're going to be doing that as part of the update anyways. Apologies if my initial post was unclear in this regard.


  • GNU/Linux Admin

    Oh, I did not see this thread immediately.

    The official NodeBB recommendation is that you shut down the database before doing a mongodump, but as you've discovered, Mongo itself does not mention anything for or against it.

    The sentence in the docs reads like something I would have written, and I think I added the recommendation to shut down MongoDB just as an extra (ultimately unnecessary) safety measure.

    Kind of like how you should probably flick off the light switch before changing a light bulb, but practically speaking you most certainly won't die if you leave it on (you might get blinded a little, though 😎 )


  • Community Rep

    @julian said in Backing Up Mongodb Backed NodeBB:

    The official NodeBB recommendation is that you shut down the database before doing a mongodump, but as you've discovered, Mongo itself does not mention anything for or against it.

    The purpose of the utility is so that you don't have to shut down the database. It locks the database and is 100% safe to use. It's the only safe way to do it without shutting down the DB completely, and a shutdown doesn't buy you anything that the lock doesn't provide.

    The NodeBB docs should be updated. It's fine to not mention this, but saying you should shut down before using the dump utility is misleading as this would imply that MongoDB is borked and can't be used in production as it can't properly lock.

    Mongo doesn't say anything against it because that would be nonsensical to have built a backup utility that can't take backups πŸ˜‰ That it is to be used while still running is implied.


  • Community Rep

    The bigger fear here is that this might encourage to do actually dangerous things like trying to use snapshots or something else that isn't reliable because it makes them feel that the database's backup mechanism is broken, when it isn't. So while it might be "extra safe", sort of, it heavily risks people not using the completely reliable backup mechanism and resorting to something not safe like snapshots, crashplan, etc.


  • Global Moderator

    mongodump and mongorestore cannot be part of a backup strategy for 4.2+ sharded clusters that have sharded transactions in progress, as backups created with mongodump do not maintain the atomicity guarantees of transactions across shards.

    It's fine for small deployments though


  • Community Rep

    @PitaJ said in Backing Up Mongodb Backed NodeBB:

    mongodump and mongorestore cannot be part of a backup strategy for 4.2+ sharded clusters that have sharded transactions in progress, as backups created with mongodump do not maintain the atomicity guarantees of transactions across shards.

    That's true. But really there you need to probably avoid backing up while sharded transactions are in progress as they can't be locked while in progress. Nothing can back up a database in that state.

    But that doesn't mean that dump is the issue, that's a specific timing on a specific case. It can still do a backup safely of a standard DB, and a sharded one, just not one doing a transaction across shards during the dump. That has to be locked.


  • Community Rep

    @scottalanmiller

    Except... Per mongo docs, snapshots are viable method:

    So it is a bit more perplexing than initially imagined for the mongo backup neophyte to discern the optimal happy path here, eh?


  • Community Rep

    @gotwf said in Backing Up Mongodb Backed NodeBB:

    Except... Per mongo docs, snapshots are viable method:

    Not exactly, read it carefully. It only works IF you do certain things in your setup which allow it to work. Most people do not do that. Can you, absolutely. But you have to design the system around corruption protection, must do it on the same volume, and when you restore you risk a rollback from the journal.


  • Community Rep

    @scottalanmiller To be clear: I am all for using mongodump/restore. Mongo seems to really, really want to push the Atlas offerings so maybe some incentive for their docs to be less than clear w,r,t, best practice alternatives.

    If I was going to shutdown NodeBB to take a dump, as Julian suggested above, then I may well be better off grabbing a zfs snapshot - more comprehensive total vm backup and the deltas might require less storage space over the long run?

    But I am not... I am living large and running mongodump on a hot mongo. 😜


  • Community Rep

    @gotwf

    The bottom line here is that there are computing basics that always apply and information from other sources are irrelevant. Then concept of database backups is always the same. No information from NodeBB or MongoDB can alter how a database interacts with a filesystem. So the universal rules always apply.

    A database uses a live file on the filesystem and/or has data in RAM. Anything that has it's data file open and/or has data in RAM cannot be fully backed up via a snapshot mechanism or backup software at the filesystem level - full stop, no exceptions. This is universal and any "but I asked X vendor" just means you risk getting a wrong answer. This is basic computing physics and applies to all databases, and many other things. It's a computing pattern.

    You can stop a database from being a live database by powering it down or otherwise forcing it to write everything to the storage subsystem and locking it to prevent further transactions and then use a reliable component of the storage subsystem to take a snapshot or file copy - but a snapshot can never do something that a file copy cannot. It's both or neither.

    Or you can have a database that is set up to have a non-live on disk storage in addition to the live, like a journal, which takes longer and uses more resources but allows you to roll forward or back to "fix" the corruption. That a journal is required is MongoDB making it clear that the snapshot of the DB itself isn't safe and an additional copy of the data must exist. But that means that the entire functionality is impacted to make this possible (which is fine) and that the last transaction is always at risk but the database beyond the last transaction is at least safe because it can be recreated from the journal.

    The mechanism to make the journal is not unlike the dump mechanism. And it might be the same code under the hood. Both are using the database's application logic to determine what "should be" and present it safely when storage safety cannot be determined. Making a journal is a little like making a dump locally for every transaction so that one is always present. You have to trust the dump in order to trust the journal.

    As with all databases or any similar application that keeps live data open or works from RAM - the only possible safe mechanism to ensure data integrity - short of powering down the system entirely - is a built in locking and backup mechanism that has access to every bit of the data in flight and ensures that it is in a non-corrupted, consistent state when flushed to disk. You can't make a simpler, lighter, more reliable method no matter what tools you use.

    The thing that makes this seem confusing is when you start looking at it from a NodeBB or MongoDB level, it feels natural that one or the other might have some special insight into their unique situation, but they do not, they cannot. What determines how NodeBB backups work is the universal laws of computing and how they apply to databases. Trying to look at it from any other level will lead to confusion or risks as the more you ask, the more chances for someone along the chain to be misunderstood.

    Attempting to look for ways around the physical constraints of computing can only lead, if there are no errors, to dead ends, or worse if mistakes are made, to accidentally getting a bad answer.

    Beyond that, snapshots are heavy and slow, dumps are fast and light. There should never be a desire to work around them as they really carry no caveats, just pros. Fast, simple, reliable, and the smallest resulting backup set size.


  • Community Rep

    @gotwf said in Backing Up Mongodb Backed NodeBB:

    If I was going to shutdown NodeBB to take a dump, as Julian suggested above, then I may well be better off grabbing a zfs snapshot - more comprehensive total vm backup and the deltas might require less storage space over the long run?

    Snapshots are big and slow. Ideally a restore operation would not involve putting a snap back in place, but a restore of only the data.

    In the DevOps and post-DevOps backup world, the idea of snapshots or any full volume / full system backup is considered a failure of design. It's heavy to backup, heavy to restore, heavy to store. Modern system design allows us to quickly restore base systems sans data, quickly. This is what makes cloud efficient. I have a single command that builds my NodeBB instances, for example. It takes maybe a minute, doesn't require my time, is repeatable (and testable) and is needed for more than restores but for updates, moves, relocations, growth, etc.

    Since that tool is already ideal and in place, the ideal restore is to use that and simply replace the data, and nothing more. Snapshots are unnecessarily large and slow to restore (and more prone to corruption.) The straight data is the fastest thing to restore. So that's what we want in a restore situation. Faster to move over the network, faster to put onto disk.

    So even if snapshots are available to us, we should never want them. Using snapshots is necessary for situations where we are stuck with legacy systems that cannot be automated in a modern way and we have to brute force past bad designs, software, or politics. But not something we should ever "want" if we have our druthers.


  • Community Rep

    I just happen to have a video of me presenting this topic at a conference, lol.


Log in to reply
 

Suggested Topics

| |