My smf to NodeBB migration thread
-
Hey,
I'll post some stuff here regarding my migration from SMF 2.x forum to NodeBB. I've had a forum for ~4 years and it's about time I revamped it. I'm a huge node.js and redis fan, so having a forum written in php is blasphemous.
So far, i've been working on migrating the data from my SMF forum to NodeBB. Everything is going good here, i've been able to get everything into NodeBB with the correct timestamps etc etc. So that's cool.
NodeBB is crazy fast, and can be even faster.
One issue i've noticed so far, is that once I loaded some huge amounts of data into NodeBB, I started seeing a serious drop in performance. The good news is that we can fix it.
Here's a small screencast of what my 'load times' look like when visiting the "progress journals" subforum. This sub forum isn't "that big", 71 posts. However, some of the threads are enormous, so nodebb is doing say, 25k-30k redis operations just to load it, then it's doing a ton of server side html/dom processing which seems to be the biggest issue.
http://screencast.com/t/d1ooHZtEvf
Ok so, in order as is seen in the video, here are the "benchmarks".
- NodeBB unmodified code: 17.x seconds
- NodeBB modified code, using a redis.multi pipeline to send all of the hgetall's to redis in one request: 15.x seconds
- NodeBB modified code, using the same pipeline above AND getting rid of posts.toHTML(): 3.x seconds
**CORRECTION:
Also, all of that would actually be slower, but I've changed the redis code to use a UNIX DOMAIN SOCKET (/tmp/redis.sock) etc, which speeds it up a bit. I've also installed the hiredis native C redis package, that actually seems pretty significant as well.
**I'm not so great with the DOM/"front end stuff", so i'm not sure how hard/easy/feasible it would be to move that posts.toHTML()/cheerio stuff to the browser, client side? If data were sanitized on the way in (during a post from a user), then you could simply return it to the browser and have it do the DOM crunching client side, which would free up nodebb server-side significantly?
So far, that seems to be the only issue i've come across.
Ok so i'll look for a few more spots to possibly speed it up initially. Right now i'm thinking the initial dashboard, it uses a bunch of hmget's maybe 'multi' will have an impact.
peace!
-
Hey @adarqui,
Thanks for all of the hard work you're putting into this! We haven't had a large dataset to test with, so it's really great to see how NodeBB handles under a huge load.
17 seconds is definitely not scalable, and if the majority of it is taken up by
postTools.toHTML
, then that's a good place to start.As for whether it is feasible to switch it over to client side, that's a difficult question to answer immediately, as it hooks into our plugin system. NodeBB saves the raw markup (or markdown rather, heh) as typed into the posts composer, and parses it using markdown (or any other plugins). That particular subsystem might be difficult (if not impossible) to move to the client side. If the bottleneck is
cheerio
, however, then I think we may be able to do something about that...As a point of reference... how long did it take to load the "Progress Journals" subforum in SMF?
-
I'm also interested in seeing is
cheerio
is really the bottleneck (which it very well may be).Just add the following line after the
filter:post.parse
hook is fired (line 195 of postTools.js):plugins.fireHook('filter:post.parse', raw, function(parsed) { return callback(null, parsed); // <-- new line
Fingers crossed, if this takes care of the majority of the slowdown, then we can try to recreate the skipped code without using
cheerio
. -
Hey @adarqui,
Thanks for all of the hard work you're putting into this! We haven't had a large dataset to test with, so it's really great to see how NodeBB handles under a huge load.
17 seconds is definitely not scalable, and if the majority of it is taken up by
postTools.toHTML
, then that's a good place to start.As for whether it is feasible to switch it over to client side, that's a difficult question to answer immediately, as it hooks into our plugin system. NodeBB saves the raw markup (or markdown rather, heh) as typed into the posts composer, and parses it using markdown (or any other plugins). That particular subsystem might be difficult (if not impossible) to move to the client side. If the bottleneck is
cheerio
, however, then I think we may be able to do something about that...As a point of reference... how long did it take to load the "Progress Journals" subforum in SMF?
hey!
http://www.adarq.org/progress-journals-experimental-routines/
It takes ~1.5 sec.
It's only doing a few queries to achieve that, since everything is packed into a few mysql tables.
I cloned the latest NodeBB repo, no modifications, here's what I get when I load progress journals subforum:
Two separate tests:
redis-cli MONITOR > /tmp/journals.monitor
^C
root@serv:~/admin/git/NodeBB/src# wc -l /tmp/journals.monitor
58137 /tmp/journals.monitor
root@serv:~/admin/git/NodeBB/src# grep hgetall /tmp/journals.monitor |wc -l
42954redis-cli MONITOR > /tmp/journals.monitor
^C
root@serv:~/admin/git/NodeBB/src# wc -l /tmp/journals.monitor
58137 /tmp/journals.monitor
root@serv:~/admin/git/NodeBB/src# grep hgetall /tmp/journals.monitor |wc -l
42954Someone possibly added more queries, last time I checked that (a few days ago), it was 25k. This is probably just a small tuning issue. I'll send you the redis-cli MONITOR log.
I'm going to re-migrate data back into nodebb tonight, so, if you want I can even give your team access to my testing server if they want to play around.
cya!
-
I'm also interested in seeing is
cheerio
is really the bottleneck (which it very well may be).Just add the following line after the
filter:post.parse
hook is fired (line 195 of postTools.js):plugins.fireHook('filter:post.parse', raw, function(parsed) { return callback(null, parsed); // <-- new line
Fingers crossed, if this takes care of the majority of the slowdown, then we can try to recreate the skipped code without using
cheerio
.That speeds it up significantly. Fresh clone/test:
-
Without the 'cheerio bypass' (lol): 18.x seconds
-
With the 'cheerio bypass': 10-12s
btw all of these numbers I do multiple tests, it's very consistent.
So, bypassing cheerio is shaving 6-8 seconds off of the load time. It's also doing a massive amount of redis queries though too so, if that was tuned, maybe all of this would even become a non-issue.
i'll msg you the log.
-
-
It's also doing a massive amount of redis queries though too
For awhile there, I thought you were saying cheerio was doing a lot of redis queries, which made me do a double take there.
Anyways - given a large amount of posts, relying on Cheerio for what I do is definitely a bad idea. A slim regular expression should do the trick much faster (one would hope).
My other concern is why NodeBB is rendering every single post's content when you're going to the topic listing. That seems a bit preemptive!
If I have some time later today, I'll push a new branch with some optimizations re: phasing out cheerio, etc.
-
Please refer to Github issues #320 and #321 for discussion regarding the issues raised here.
These optimizations will be implemented and acid-tested on a new branch before being merged into
master
.Cool! Yea I was unsure of why there were so many queries on category load.
Regarding the external page click 'interception': Is it possible to make that customizable as well? On my forum, people post tons of external links, they always expect a "new tab" if it's outside of the adarq.org domain. I do some other things such as lightbox etc, but that's easy to customize on my end.
If you guys ever want to experiment live with the 93k posts data, I can easily give you access for testing.
cya!
-
Hey,
I'll post some stuff here regarding my migration from SMF 2.x forum to NodeBB. I've had a forum for ~4 years and it's about time I revamped it. I'm a huge node.js and redis fan, so having a forum written in php is blasphemous.
thanks for all the effort you've put into this!:)