Import/Display Issue With &#codes;

General Discussion
  • #1

    I'm running a data import using @bentael 's importer.

    It looks like posts that have escape sequences render correctly as part of a posts content, but display the literal escape sequence in the post title. Screenshot:

    Screen Shot 2014-02-18 at 3.37.28 PM.png

    And looking at the data via redis on the backend:> hget topic:3369 title
    "Исчезли деньги ?!"> hget post:3369 content
    "Всем привет, буквально 5.06 поменял реквизиты с помощью своего мененжера, сегодня захожу в dashboard, а на баллансе 2 доллара, было до этого почти 50 долларов, в разделе Royalties нету выплат, где тогда деньги?"> 

    It looks like the content and title are being encoded the same way, but for some reason it renders properly in the post content but not the title.

    I also tried copying and pasting some of the properly rendered characters as the title of a new post. This appears to work but produces a different encoding within the database:> hget topic:3472 title
    "\xd0\x92\xd1\x81\xd0\xb5\xd0\xbc \xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82">

    Any suggestions? I suppose I could try to perform the conversion between encodings during the import... any idea the specific names of these two encoding types?

  • Plugin & Theme Dev

    hmm, could it be one the plugins?
    which plugins do you have activated? list them all please.

  • Plugin & Theme Dev


    never mind it's not a plugin, it's this line, i think

    I tried to hack the source, to take out the .escape() call on that line, it turns out sanitize() returns a validator object, that's why you can chain (escape()) i think, but the string result is in there, {str: '... result ...'} so the new line would look like:

    // topic.js:282'ish
    // FROM
    data.title = validator.sanitize(data.title).escape();
    // TO
    data.title = (validator.sanitize(data.title) || {}).str || '';

    then for some reason, i was getting a TypeError in webserver.js, so i added a safe check there

    // webserver.js:121'ish
    // FROM 
    				tag.content = tag.content.replace(/[&<>'"]/g, function(tag) {
    // TO
    				tag.content = (tag.content || '').replace(/[&<>'"]/g, function(tag) {

    That worked for me, but, it's hack, if it works, file an issue on github repo, there maybe a good explanation why, if nodebb expects us to escape the title before hand, the importer can do that.. but im a little skeptical.

  • GNU/Linux

    As usual, we only sanitize stuff on the way out, not on the way in, so I'd look to the importer to see why they are being saved into Redis as ... html entities?

    For reference, redis expects \x escaped unicode:[2]> set foo ั‚ะตัั‚
    OK[2]> get foo
  • Plugin & Theme Dev

    @medwards the only place where the import plugin touches the content is the convert function, if there is valid convert config, You've mentioned in a previous conversation that you're using a custom build bbcode-to-md convert function, or you modified the one in there, Can you find an example topic, and maybe map it to its content in the vb database? and paste the two here so I can test the conversion functions?

    btw @julian i noticed the that validator package is way out of date, looks the repo has moved and the API changed a bit, i.e. sanirtize() was removed

  • GNU/Linux

    "validator": "~1.5.1",

    Latest version seems to be 3.2.1. Wow, what a pain. Thanks @bentael

  • #7

    Sorry for the late reply.

    So it sounds like it's not really intended for the HTML Entities to ever make it into the database. I was able to pull a node module "html-entities" and feed the titles through that on the way in through the importer. The post contents were rendering okay so I just left those as is, didn't want to mess around with what order to apply conversions and whether they'd screw each other up.

    One of these days I'll get around to contributing back some changes to nodebb-plugin-import. Things have just been crazy around here lately.

    Thanks for all your help.

  • GNU/Linux

    Validator is now v3.2.1, but it's a transparent upgrade, so it shouldn't affect anything here.

Suggested Topics

  • 1 Votes
    4 Posts

  • 0 Votes
    4 Posts

  • 0 Votes
    1 Posts

  • 1 Votes
    6 Posts

  • 0 Votes
    4 Posts

| | | |