Default search, Solr, importance of full-text search?
-
@bartvb I have just tried Elasticsearch. The default Elasticsearch plugin included by NodeBB doesn't even allow to (re-)index all posts … which means manual importing. No …
The question is: shall I try to improve the Elasticsearch plugin, creating the 10-th unmaintained fork which is then working on a single instance (mine) until the next major NodeBB update, while overall search is still horrible for 99% of NodeBB users? That's what I wanted to avoid. But it seems like interest in working full-text search is quite low
-
@rfc2822 My guess is that the NodeBB project gains the most when nodebb-plugin-dbsearch is improved. I did a quick check of the MongoDB docs and it seems like there is quite a bit of room for improvement when it comes to the implementation of MongoDB fulltext search in NodeBB.
Performance is not really the issue here, I'm guessing that normal MongoDB operations will come crashing down before MonogDB gives up on fulltext-search on really big boards. Which leaves search quality.
As far as I can tell things would already be quite a bit better if by default all searches use an AND operator between al terms and if a term like 'site.com' won't be broken up into 'site' and 'com'. And while you're at it it would be nice if relevance would take age into account
-
For reference: https://github.com/julianlam/nodebb-plugin-solr/issues/34
-
@bartvb said in Default search, Solr, importance of full-text search?:
@rfc2822 My guess is that the NodeBB project gains the most when nodebb-plugin-dbsearch is improved. I did a quick check of the MongoDB docs and it seems like there is quite a bit of room for improvement when it comes to the implementation of MongoDB fulltext search in NodeBB.
My experience with database full-text indices is bad. MySQL has a full-text index, too, and I have used it for several projects. It's OK, but the results are not as good as from specialized search engines as Solr. It confirms my prejudices when looking at what the default NodeBB Mongo-based full-text search produces. I also remember what happens when other projects (CMS like Typo3) try to build their own full-text search. It's just horrible and at the end, you still have to use a public search engine with
site:....
to find what you're looking for.Good full-text searching is an extremely complicated task. Why not leave it to specialized projects? I just don't understand why every project needs its own unusable low-quality full-text search instead of
- making use of public search engines (in the end, this is what most people do – redirect to Google), and
- making use of already available, specialized open-source search engines like Solr and Elastic Search if this is not an option?
Of course, the MongoDB full-text index could still be available, but not as only officially supported and advertised solution.
In my opinion, a side project of a database ("oh yes, full-text index would be cool, let's add it quickly") can never provide acceptable results because the topic is too complex. I also guess that the people who have developed Solr are not idiots who like to waste their time for nothing. (Ok, this applies to MongoDB people, too, but maybe the MongoDB full-text index is for use cases where high-quality full-text search is not as important as for a forum.)
-
The implementation here, is for a "good enough" solution for small boards where dbsearch is more than adequate.
While I admire your quest for a "good" search engine for NodeBB, one simply won't exist natively here, a third-party solution must be utilised. We will try our best to tweak the dbsearch results (e.g. implicit
AND
being one change we ought to adopt), but I personally am hoping for someone to develop an amazing search engine for Node.js so I can use the library as a moduleHowever that may well be unfeasible for a variety of reasons, Node.js not being the right tool being just one of those
-
@rfc2822 said in Default search, Solr, importance of full-text search?:
Good full-text searching is an extremely complicated task.
This is so true. I do believe you can approach Google-quality search results with a specialized search engine (because within a forum, you have a lot more context on what is important to your users), but it takes some science, metadata, and lots of tweaks to get good results. Good full text search isn't really just full text, it's a combination of a ton of other data, which is why Google is so good at it. The whitepaper on Google search is a taste of the complexity that goes into how they produce such good results.
-
@bri Searching a manageable number of postings (which should contain real text content, mostly without markup and all about a certain topic) should still be easier than indexing the whole Internet (and rank an endless number of pages which are highly "optimized" with SEO to be shown first etc.) Also, things like PageRank are not really applicable to a forum because a posting doesn't have to be linked often to be valuable.
Just had an idea: with Solr, even attachments (of postings) could be indexed
-
@rfc2822 said in Default search, Solr, importance of full-text search?:
a posting doesn't have to be linked often to be valuable
Well, yes, I was using that as an example. I didn't mean you need to literally use page rank in a forum search engine, that doesn't make sense.
-
Wish that I still had info on what it did. it has been quite a while.
-
@rfc2822 said in Default search, Solr, importance of full-text search?:
What to do? Can you suggest what I can do to get good full-text search working on a private forum? Shall I use the Solr plugin? Which one? Is there an officially supported plugin that's actually being maintained? I'd have to dig into the problem that new postings are not indexed automatically first.
Something I'd like to know the answer to. Was considering AWS and Elasticsearch, but plugin dev around here is lacking any real effort. Forks all over the place and very little comments other than not working on 1.5, 1.6, 1.7, 1.72 etc
Members constantly complain that people a) don't search for topics which already cover their questions and b) complain that when they do search the results are near useless.
Now I understand how challenging search is, especially when the person searching can't even remember the key words to look for, but without changing the options in db-search I can often fail to find my own content when searching for a single word within the title (main trick is to limit the section of the forum, results to title only and order date descending, but I can't ask everyone to learn these).
-
@artesea said in Default search, Solr, importance of full-text search?:
plugin dev around here is lacking any real effort.
Please stop making comments like this.
- The code is free and open source so you’re free to fix it.
- You sound entitled. If you’re paying for the plugin you can complain like this.
- The lack of effort is really on your part if you’re not willing to contribute.
Free software != free support