How to hace nice URLs?
-
What I am looking for is a config within nodebb that converts all chars into nice characters.
This happens nicely in WordPress for example, you write the title with any character, but only the nice ones are used to create the "slug", so the slug is always nice, and that prevent the need to any extensions.From your reply, it seems nodebb does not have anything like that?
-
@darkpollo @baris NodeBB just runs
slugify()
which is fairly restrictive... I believe it only allows for the latin alphabet for broadest compatibility even though URL paths can contain a broader set of UTF-8 chars.The problem here is we "slugify" strings for two reasons:
- URL Safety and Readability, so invalid characters are removed and a "human-readable" slug is produced
- Compatibility, so when data is sent across to other sites, the chance of the same data coming out the other side is increased.
I think the solution here is to refactor
slugify
so that there are two types, one for url safety and readability, and another one for compatibility. -
Slugify doesn't remove the special characters @darkpollo mentioned.
const s = await app.require('slugify'); console.log(s('a title with speacial chars á é í ó in it')) a-title-with-speacial-chars-á-é-í-ó-in-it
The issue I saw was they were encoded if the entire url from the address bar is copied.
-
@baris
Yes, I understand, but we need them to be removed so the urls are nice.
What @julian suggest of having an option for this seems great.For what I see on slugify, they have an option for this.
https://www.npmjs.com/package/slugifystrict: false, // strip special characters except replacement, defaults to
false
This is how is made in WordPress (sorry for the reference but this is where I am coming from... ) https://developer.wordpress.org/reference/functions/remove_accents/
And from what I can see in the code:
https://github.com/NodeBB/NodeBB/blob/45eabbf5ba2100201fe887a9d6fc81d88379d414/src/slugify.js#L2It is supposed to strip those as well.
https://github.com/NodeBB/NodeBB/blob/45eabbf5ba2100201fe887a9d6fc81d88379d414/public/src/modules/slugify.js#L23function string_to_slug(str) { str = str.replace(/^\s+|\s+$/g, ''); // trim str = str.toLowerCase(); // remove accents, swap ñ for n, etc var from = "àáäâèéëêìíïîòóöôùúüûñç·/_,:;"; var to = "aaaaeeeeiiiioooouuuunc------"; for (var i=0, l=from.length ; i<l ; i++) { str = str.replace(new RegExp(from.charAt(i), 'g'), to.charAt(i)); } str = str.replace(/[^a-z0-9 -]/g, '') // remove invalid chars .replace(/\s+/g, '-') // collapse whitespace and replace by - .replace(/-+/g, '-'); // collapse dashes return str; }
So maybe this is a bug?
I think to make nice urls and slugs, all should be replaced...
-
@darkpollo looks like we used replace accented characters with regular ones but it was changed to support unicode slugs long time ago, https://github.com/NodeBB/NodeBB/commit/53caa5e422b986aa6e4a6e7350267777ad5741a5, not sure if reverting back would break stuff @julian
-
@baris Thanks.
It seems that when that was changed, we lost that ability. Maybe just updating slugify could work?
We are using a 11 years old slugify version...Or just add back this:
// remove accents, swap ñ for n, etc var from = "àáäâèéëêìíïîıòóöôùúüûñçşğ·/_,:;"; var to = "aaaaeeeeiiiiioooouuuuncsg------"; for (var i = 0, l = from.length; i < l; i++) { str = str.replace(new RegExp(from.charAt(i), 'g'), to.charAt(i)); }
Also I have been reading the forum and I found many post about this, most are from non-English languages.
I think this is something should be addressed and fixed but of course we do not want to break anything. -
You can open an issue on our github, it is likely a breaking change because it would alter old slugs, for example if a user had a userslug of
àáäâ
i would becomeaaaa
after this change, same for categories, groups. So it isn't a simple change of just updating the slugify function.