nodeBB and "serverless" databases

razibal

I've been exploring the possibility of using a cloud based "serverless" DB as a drop-in replacement for mongodb. This is the only remaining component in the nodeBB architecture that is currently relatively difficult to scale without over-provisioning for peak capacity. There are a couple of candidates that appear to qualify on paper. The first being Cosmos DB

Serverless deployments: Cosmos DB for MongoDB offers a serverless capacity mode. With Serverless, you're only charged per operation, and don't pay for the database when you don't use it.

Free Tier: With Azure Cosmos DB free tier, you get the first 1000 RU/s and 25 GB of storage in your account for free forever, applied at the account level. Free tier accounts are automatically sandboxed so you never pay for usage.

I was able to dump and restore to Cosmos relatively easily

screenshot-portal.azure.com-2023.05.18-12_13_07.png

of course nothing ever works the first time out as nodeBB refused to start

2023-05-18T16:40:46.781Z [4564/21343] - error: MongoServerError: Error=2, Details='Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 1a6747b1-c46c-4fb1-9847-f9a788f35376; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 1a6747b1-c46c-4fb1-9847-f9a788f35376; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 1a6747b1-c46c-4fb1-9847-f9a788f35376; Reason: (Message: {"Errors":["The index path corresponding to the specified order-by item is excluded."]}

I haven't had a chance to dig deeper, but will do so at some time.

The next candidate is Tigris, which is promising because it promises
MongoDB compatibilty as well as enhanced full text search.

Cloud-native architecture: Tigris follows a modern composable architecture with loosely coupled components. Furthermore, compute is separated from storage, allowing for independent scaling and a more resilient system.

Automatic database sharding: Tigris provides automatic database sharding, and shard keys are unnecessary as the data distribution is automatically handled. This includes both sharding and optimizing the number of shards. MongoDB Atlas requires you to specify and manage sharding to horizontally scale. With Tigris you get more time to focus on developing applications for your users.
Cost-efficient and unlimited scalability: Unlike MongoDB Atlas, Tigris can scale to millions of records read and written per second and petabytes of data storage at a fraction of the cost

josef

@razibal I did not understand. What is the disadvantage of MongoDB Atlas?

razibal

@josef MongoDB Atlas has the same disadvantage as any other flavor of mongodb. It just wasn't designed for horizontal scaling. It can obviously handle very large workloads through vertical scaling along with sharding for distributing the workload but it does require provisioning resources for peak capacity. You can scale those back down, but that's not the same as a true serverless database like DynamoDB or Spanner, both of which are built for capacity at a regional level with the individual tenants paying for actual utilization of read/write units. Think of it like any other commodity, if you know you are going to consume large quantities of produce, you buy in bulk and store it yourself. For the ordinary consumer, this is not practical as you would end up throwing out the produce on days you don't need it. Hence you want the best price possible for individual units so you can buy 'just in time' when you need it . Thats what the DynamoDBs and Spanners provide as they essentially provision their infrastructure to handle the needs for all the anticpated demand within a region. MongoDB Atlas is attempting something similar, but you still have to buy in units of a dozen (to use an eggs analogy). They do have a 'serverless' option, but its an afterthought as it has low performance and is really intended for non-production workloads.

razibal

@julian, @baris as I read more about Tigris it occurs to me that it might be worth exploring the possibiltiy of supporting it natively rather than through its MongoDB compatibility layer. It looks quite attractive as its Apache 2.0 licensed, can be self hosted, supports clustering and has a SaaS serverless offering. Any thoughts?

screenshot-www.tigrisdata.com-2023.05.19-08_54_59.png

razibal

We've started preliminary work on building a Tigris interface for NodeBB. One of the potential issues I can see is related to the use of the 'objects' collection to store documents of very different schemas. In many cases, these documents contain the same fields with different types. We can work around the differing schemas by building a custom map that allows us to force a type based on document type, but it would probably be preferable to store these documents in different collections for posts, topics, categories and so on. We would then have a custom map to use the specific collection depending on the object in the query.
Any suggestions on the best path forward?

julian

We certainly wouldn't be opposed to a Tigris interface, although it's not necessarily something we have a lot of experience (read: none) with, and it would be a community-maintained endeavour for that reason.

@baris would know better, but our MongoDB adapter stores everything in objects because Redis (the original data store we used) did not have a concept of separate collections.

I definitely think there's value in breaking out the main objects collection to smaller collections, but we haven't run into any major issues currently.

razibal

@julian I think we've got it working albeit without the splitting out of objects to multiple collections. I think we'll probably tackle that as well, fortunately it should be relatively straight forward since the keys are already prefixed with a logical collection name e.g "post:1380" or "user:100".
In the meantime we have some testing ahead of us to ensure that nothing breaks.
We're happy to maintain it ourselves, but I think you guys should consider adding it as a core database. It gets NodeBB one step closer to being the first true "cloud native" forum platform. The only thing left would be splitting the UI from the API services and you would be all the way there.