Import/Display Issue With &#codes;
-
I'm running a data import using @bentael 's importer.
It looks like posts that have escape sequences render correctly as part of a posts content, but display the literal escape sequence in the post title. Screenshot:
And looking at the data via redis on the backend:
127.0.0.1:6379> hget topic:3369 title "Исчезли деньги ?!" 127.0.0.1:6379> hget post:3369 content "Всем привет, буквально 5.06 поменял реквизиты с помощью своего мененжера, сегодня захожу в dashboard, а на баллансе 2 доллара, было до этого почти 50 долларов, в разделе Royalties нету выплат, где тогда деньги?" 127.0.0.1:6379>
It looks like the content and title are being encoded the same way, but for some reason it renders properly in the post content but not the title.
I also tried copying and pasting some of the properly rendered characters as the title of a new post. This appears to work but produces a different encoding within the database:
127.0.0.1:6379> hget topic:3472 title "\xd0\x92\xd1\x81\xd0\xb5\xd0\xbc \xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82" 127.0.0.1:6379>
Any suggestions? I suppose I could try to perform the conversion between encodings during the import... any idea the specific names of these two encoding types?
-
hmm, could it be one the plugins?
which plugins do you have activated? list them all please. -
never mind it's not a plugin, it's this line, i think
https://github.com/designcreateplay/NodeBB/blob/master/src/topics.js#L282I tried to hack the source, to take out the
.escape()
call on that line, it turns outsanitize()
returns a validator object, that's why you can chain (escape()
) i think, but the string result is in there,{str: '... result ...'}
so the new line would look like:// topic.js:282'ish // FROM data.title = validator.sanitize(data.title).escape(); // TO data.title = (validator.sanitize(data.title) || {}).str || '';
then for some reason, i was getting a TypeError in webserver.js, so i added a safe check there
// webserver.js:121'ish // FROM tag.content = tag.content.replace(/[&<>'"]/g, function(tag) { // TO tag.content = (tag.content || '').replace(/[&<>'"]/g, function(tag) {
That worked for me, but, it's hack, if it works, file an issue on github repo, there maybe a good explanation why, if nodebb expects us to escape the title before hand, the importer can do that.. but im a little skeptical.
-
As usual, we only sanitize stuff on the way out, not on the way in, so I'd look to the importer to see why they are being saved into Redis as ... html entities?
For reference, redis expects \x escaped unicode:
127.0.0.1:6379[2]> set foo тест OK 127.0.0.1:6379[2]> get foo "\xd1\x82\xd0\xb5\xd1\x81\xd1\x82" 127.0.0.1:6379[2]>
-
@medwards the only place where the import plugin touches the content is the convert function, if there is valid convert config, You've mentioned in a previous conversation that you're using a custom build bbcode-to-md convert function, or you modified the one in there, Can you find an example topic, and maybe map it to its content in the vb database? and paste the two here so I can test the conversion functions?
btw @julian i noticed the that validator package is way out of date, looks the repo has moved and the API changed a bit, i.e.
sanirtize()
was removed -
Sorry for the late reply.
So it sounds like it's not really intended for the HTML Entities to ever make it into the database. I was able to pull a node module "html-entities" and feed the titles through that on the way in through the importer. The post contents were rendering okay so I just left those as is, didn't want to mess around with what order to apply conversions and whether they'd screw each other up.
One of these days I'll get around to contributing back some changes to nodebb-plugin-import. Things have just been crazy around here lately.
Thanks for all your help.