Default search, Solr, importance of full-text search?

General Discussion
  • Hi,

    I have some questions about full-text search and how it's intended to work with NodeBB.

    Default search: I have had NodeBB running with the default search plugin (db-search-plugin), but search results were of very bad quality. I really tried to use it, but I can't tell users to "use the forum search" to find the answer for a question, because it just doesn't work. It shows a huge bunch of unrelated postings etc.

    Solr: So I decided to set up a Solr instance and install "the" Solr plugin. However, it was not easy to find a working Solr plugin. There were several ones, some basically working, some not. Also, the only working Solr plugin reported that the Solr collection works; re-indexing and searching were also working (with high-quality search results :)). However, new postings were not indexed, although the option is on. So, using Solr is also very cumbersome and not really useable currently.

    Redirecting to a search engine: On the public forum, I have just disabled forum search and used a search bar which redirects to a search engine (with site:forum.domain appended). However, I'd like to start a private forum, too (with sensitive information), and that's not an option there. 😕

    Your opinion: importance of search? My personal opinion is that good full-text search is a very importang thing for a forum. Whats your opinion on that? Isn't it one of the most important features of a forum? Most people recommend to use the forum search before asking the same question again and again. In my opinion, good full-text search makes the difference between a volatile social network discussion and a knowledge database.

    Suggestions: I know that good full-text search is a very complex topic and can't be done as a "side project". So I wonder whether the default NodeBB installation could be improved by providing

    • a working and officially supported Solr plugin for those who need self-hosted search (for instance because the forum is private), and
    • a working and officially supported "redirect to $search-engine" plugin (when the forum is public and the human/computing resources for self-hosting Solr are not available)

    instead of activating the db-search-plugin by default.

    What to do? Can you suggest what I can do to get good full-text search working on a private forum? Shall I use the Solr plugin? Which one? Is there an officially supported plugin that's actually being maintained? I'd have to dig into the problem that new postings are not indexed automatically first.

    What search plugins do hosted NodeBB installations use?

    PS: Thanks for providing NodeBB as free and open-source software! I really love it and I'd also be motivated to contribute to NodeBB in this matter. I just don't know where to start, so I decided to start it by asking and discussing 🙂

  • All the cool kids seem to be using ElasticSearch nowadays. Did you try something like:
    https://github.com/q8888620002/nodebb-plugin-elasticsearch

    But I agree with you on the importance on a properly functioning forum/site-search.

  • We had tried Solr and didn't find it any better than the default. They were both quite bad. So w settled on the default because it doesn't break from time to time like Solr did.

  • I maintain the Solr plugin, and I use "maintain" loosely, because it's been working so far, but for all intents and purposes we do prefer dbsearch over solr.

    Can you share any specific complaints about dbsearch? If there are obvious deficiencies, then we can address, but if your complaint is "it is not as good as Google", then I'm afraid I have some bad news for you...

  • @julian said in Default search, Solr, importance of full-text search?:

    I maintain the Solr plugin, and I use "maintain" loosely, because it's been working so far, but for all intents and purposes we do prefer dbsearch over solr.

    Can you share any specific complaints about dbsearch

    If there are obvious deficiencies, then we can address, but if your complaint is "it is not as good as Google", then I'm afraid I have some bad news for you...

    I intentionally kept away from complaining about the forum search itself, because I know enough about full-text search to be sure it's not a side project that you do while passing. Just have a look at Solr, how many work has been put there and in my experience, it provides high-quality search results. No need for Google.

    But I have now enabled dbsearch again for https://forums.bitfire.at/category/4/davdroid. Let's look at some actual search queries (taken from the logs):

    (Update: We have now switched to Solr again, the search results are now Solr results)

    etc. etc. …

    As said above, this is not a complaint about how bad dbsearch is, but I'm just asking whether it wouldn't be better to outsource the very complex matter of full-text search to a specialized engine (like Solr, which in my experience is able to provide much better results) and support that officially.

  • @rfc2822 are you on redis or mongodb? nodebb-plugin-dbsearch doesn't really work that well on redis, on mongodb it uses the full-text search feature of mongodb.

  • @baris we're using mongodb:

    {
        "db": "nodebb",
        "collections": 4,
        "objects": 398367,
        "avgObjSize": 198.98472263013753,
        "dataSize": 79268947,
        "storageSize": 51212288,
        "numExtents": 0,
        "indexes": 12,
        "indexSize": 35684352,
        "ok": 1,
        "mem": {
            "bits": 64,
            "resident": "0.485",
            "virtual": "0.753",
            "supported": true,
            "mapped": "0.000",
            "mappedWithJournal": 0
        },
        "collectionData": [
            {
                "name": "nodebb.sessions",
                "count": 63393,
                "size": 15730504,
                "avgObjSize": 248,
                "storageSize": 13611008,
                "totalIndexSize": 4464640,
                "indexSizes": {
                    "_id_": 3674112,
                    "expires_1": 790528
                }
            },
            {
                "name": "nodebb.searchpost",
                "count": 8553,
                "size": 6685401,
                "avgObjSize": 781,
                "storageSize": 6701056,
                "totalIndexSize": 14737408,
                "indexSizes": {
                    "_id_": 176128,
                    "content_text_uid_1_cid_1": 14401536,
                    "id_1": 159744
                }
            },
            {
                "name": "nodebb.objects",
                "count": 324940,
                "size": 56687082,
                "avgObjSize": 174,
                "storageSize": 30662656,
                "totalIndexSize": 16007168,
                "indexSizes": {
                    "_id_": 4759552,
                    "_key_1_value_-1": 5832704,
                    "expireAt_1": 1290240,
                    "_key_1_score_-1": 4124672
                }
            },
            {
                "name": "nodebb.searchtopic",
                "count": 1481,
                "size": 165960,
                "avgObjSize": 112,
                "storageSize": 237568,
                "totalIndexSize": 475136,
                "indexSizes": {
                    "_id_": 53248,
                    "id_1": 53248,
                    "content_text_uid_1_cid_1": 368640
                }
            }
        ],
        "network": {
            "bytesIn": 12537857266,
            "bytesOut": 31283903688,
            "numRequests": 50159150
        }
    }
    
  • @rfc2822 The fact that Solr itself is subjectively (maybe objectively) better at search is actually the primary motivator for me creating the plugin in the first place.

    However, the big downside is that Solr is a beast... it requires more resources than NodeBB itself does, and runs on Java+Tomcat, which I have next to no experience debugging 😬

    Are you sure you are sorting results by relevancy, and not by, say... post time?

  • @julian said in Default search, Solr, importance of full-text search?:

    @rfc2822 The fact that Solr itself is subjectively (maybe objectively) better at search is actually the primary motivator for me creating the plugin in the first place.

    However, the big downside is that Solr is a beast... it requires more resources than NodeBB itself does, and runs on Java+Tomcat, which I have next to no experience debugging 😬

    This is the main reason why I have really tried dbsearch for a long time before I have decided to give Solr a try. I didn't want it, had to setup an extra VM, learn the config file syntax etc… but at the end, I had search results which were good and I could recommend users to use the search function. Then I noticed that new postings were not indexed, and then I began to wonder how other people manage search, because it couldn't be such a rare requirement to have working full-text search?

    Are you sure you are sorting results by relevancy, and not by, say... post time?

    I have just clicked on the default search icon on top of the page and entered the queries, as most people do. I have linked the queries in my previous posting for reference, just have a look. The (default) URL parameters seem to be "in=titlesposts&sortBy=relevance&sortDirection=desc&showAs=posts", so yes, these results should be sorted by descending relevance.

  • Yeah I think part of the problem is if you search for http 403 it shows matches for http or 403, searching for "http 403" only returns one result

    Also searching in Titles and posts results in topic matches to be at the top so searching for just posts or just titles might lead to better results.

  • But a normal user would expect a post with both http and 403 to rank much higher than a post with only one of those terms. Actually I would only expect results with both terms.

    It would also be nice if age was taken into account when calculating relevancy. I'm currently only searching with sort set to 'last reply time'. The state of the world in 2014 isn't that relevant to me if there are newer search results.

  • Hah well that's a whole other conversation as to how you prefer to define "relevancy" 😆

    In some contexts, time and date is significent... in other contexts, maybe not.

    That said, if Solr was working fine for you, and it wasn't indexing, then there was something wrong with the configuration... it should automatically index new posts, just like dbsearch does.

  • @julian said in Default search, Solr, importance of full-text search?:

    That said, if Solr was working fine for you, and it wasn't indexing, then there was something wrong with the configuration... it should automatically index new posts, just like dbsearch does.

    Well it didn't, although it said the opposite (I even tried to turn off the setting to be sure)

    So… is there any chance that full-text search will be re-thinked again in NodeBB and my suggestions will be taken into consideration? Shall I open this topic elsewhere (issues, mailing list, …)

  • @bartvb I have just tried Elasticsearch. The default Elasticsearch plugin included by NodeBB doesn't even allow to (re-)index all posts … which means manual importing. No …

    The question is: shall I try to improve the Elasticsearch plugin, creating the 10-th unmaintained fork which is then working on a single instance (mine) until the next major NodeBB update, while overall search is still horrible for 99% of NodeBB users? That's what I wanted to avoid. But it seems like interest in working full-text search is quite low 😞

  • @rfc2822 My guess is that the NodeBB project gains the most when nodebb-plugin-dbsearch is improved. I did a quick check of the MongoDB docs and it seems like there is quite a bit of room for improvement when it comes to the implementation of MongoDB fulltext search in NodeBB.

    Performance is not really the issue here, I'm guessing that normal MongoDB operations will come crashing down before MonogDB gives up on fulltext-search on really big boards. Which leaves search quality.

    As far as I can tell things would already be quite a bit better if by default all searches use an AND operator between al terms and if a term like 'site.com' won't be broken up into 'site' and 'com'. And while you're at it it would be nice if relevance would take age into account 🙂

  • @bartvb said in Default search, Solr, importance of full-text search?:

    @rfc2822 My guess is that the NodeBB project gains the most when nodebb-plugin-dbsearch is improved. I did a quick check of the MongoDB docs and it seems like there is quite a bit of room for improvement when it comes to the implementation of MongoDB fulltext search in NodeBB.

    My experience with database full-text indices is bad. MySQL has a full-text index, too, and I have used it for several projects. It's OK, but the results are not as good as from specialized search engines as Solr. It confirms my prejudices when looking at what the default NodeBB Mongo-based full-text search produces. I also remember what happens when other projects (CMS like Typo3) try to build their own full-text search. It's just horrible and at the end, you still have to use a public search engine with site:.... to find what you're looking for.

    Good full-text searching is an extremely complicated task. Why not leave it to specialized projects? I just don't understand why every project needs its own unusable low-quality full-text search instead of

    • making use of public search engines (in the end, this is what most people do – redirect to Google), and
    • making use of already available, specialized open-source search engines like Solr and Elastic Search if this is not an option?

    Of course, the MongoDB full-text index could still be available, but not as only officially supported and advertised solution.

    In my opinion, a side project of a database ("oh yes, full-text index would be cool, let's add it quickly") can never provide acceptable results because the topic is too complex. I also guess that the people who have developed Solr are not idiots who like to waste their time for nothing. (Ok, this applies to MongoDB people, too, but maybe the MongoDB full-text index is for use cases where high-quality full-text search is not as important as for a forum.)

  • The implementation here, is for a "good enough" solution for small boards where dbsearch is more than adequate.

    While I admire your quest for a "good" search engine for NodeBB, one simply won't exist natively here, a third-party solution must be utilised. We will try our best to tweak the dbsearch results (e.g. implicit AND being one change we ought to adopt), but I personally am hoping for someone to develop an amazing search engine for Node.js so I can use the library as a module 😄

    However that may well be unfeasible for a variety of reasons, Node.js not being the right tool being just one of those 😉

  • @rfc2822 said in Default search, Solr, importance of full-text search?:

    Good full-text searching is an extremely complicated task.

    This is so true. I do believe you can approach Google-quality search results with a specialized search engine (because within a forum, you have a lot more context on what is important to your users), but it takes some science, metadata, and lots of tweaks to get good results. Good full text search isn't really just full text, it's a combination of a ton of other data, which is why Google is so good at it. The whitepaper on Google search is a taste of the complexity that goes into how they produce such good results.

  • @bri Searching a manageable number of postings (which should contain real text content, mostly without markup and all about a certain topic) should still be easier than indexing the whole Internet (and rank an endless number of pages which are highly "optimized" with SEO to be shown first etc.) Also, things like PageRank are not really applicable to a forum because a posting doesn't have to be linked often to be valuable.

    Just had an idea: with Solr, even attachments (of postings) could be indexed 🙂


Suggested Topics


  • 0 Votes
    9 Posts
    154 Views

    Did you set the username field of all users to an email in the database? If that's the case and the old username isn't stored anywhere you can't bring it back. If you are just using a hook to set username to email then you can disable that code and username should go back to normal.

    The displayname property is calculated from user.fullname and user.username based on the user setting show my fullname and if they have entered a fullname in their edit profile page.

  • 0 Votes
    3 Posts
    994 Views

    @PitaJ I was able to do it by giving the nav button a unique ID in the ID section and then just controlling the tooltip/button action in JS under the custom JS section. Surprisingly easy to do as it turns out.

  • 0 Votes
    5 Posts
    2k Views

    Hey there. Basically I have a forum around the same size as yours. I moved from myBB to NodeBB.

    I am using NodeBB since almost a year now.

    Security isn't an issue. There have been some XSS vulnerabilities here and there, but these have been patched within a few hours.

    Updating NodeBB is easy as well, you just run git pull, stop your forum, run ./nodebb upgrade and then start it again. No need to restart NodeJS or Redis/Mongo.

    Setting up a system running PHP and NodeJS is fairly simple. Maybe take a look at my tutorial "High performance stack".

    Moving NodeBB is pretty simple too. Just copy all the files to your new machine and run ./nodebb start. Thats it (assuming you have your stack installed of course).

    NodeBB automatically chooses the right language for your user. At least my members got their mother language. Alternatively the users can set the language in their profile settings.

    My forum is using a Redis database. It takes around 500-700MB RAM, not a big deal for me, but depends on your server of course.

    To contribute to the NodeBB translations you can check Transifex and participate.

    If you want I can help you doing the movement. Feel free to drop me a message.

  • 0 Votes
    1 Posts
    603 Views

    Why is not implemented in the search for the infinite scrolling?
    https://community.nodebb.org/search/

  • 0 Votes
    18 Posts
    5k Views

    It's a bit off-topic but still related to default settings for external links. I haven't tested the acp of 0.7.0 yet and I was wondering if there was something like a whitelist for external links? Then if someone click on an external link we can activate the warning page "you are leaving the forum", but if it's a sub domain, a root domain or a trusted url (youtube) for example we don't show this page.