An idea of new plugin html-preparser
-
Hi, I'm back with a new idea again
After I made MagicBlock and got an issue of insecure images, I thought to make a plugin ( tentatively named long
nodebb-plugin-secure-image-advance
) which include white list ( managed by human in first phase ) checking of images.But immediately I realized that it needs to handle
<code>
blocks again and this could be same for many other plugin which parse contents and filter it.So I bring new idea of
pre html parsing
plugin. This should be placed after a composer related plugin ( markdown ), plugin is better then a core's. And it can fire own filter hooks, and other plugin can use it.I also have thought some specs. Basic list of preparsing will be
- Replace
<code>..</code>
block with an unique string , and fire filter hooks. After all hooos, recover code blocks. I do same in Magicblock. - Fire filter hooks with list of image tags.
- Fire filter hooks with list of link tags.
This can reduce amount of resources for multiple parsings of whole contents for same purposes by plugins.
Cons are also quite clear.
- Plugin dependency ( I think this is huge. Has NodeBB any solution for this? Or am I only one who doesn't know an already existing feature? )
- whole filtering structure of contents will become much more complicate.
- If many plugin doesn't use this, then maybe useless
So, personally, I want to have it, and I may do because I'll have own similar plugins more than two. But I want to hear comments from a community before.
- Replace
-
Well, the main thing is you have raw markdown saved into the database (which is ok), then converted to html by the markdown plugin, and after that, going back to a tokenized form is quite difficult.
In fact, because of this reason (that it is not a 1:1 relationship between md, html, and back to md) that we can't natively support a wysiwyg and markdown solution.
It is possible to listen to a plugin hook fired by markdown if you'd like to hook into the parser used, though I will admit the logic is needlessly complex
-
@julian Hi,
Actually, what I meant about this plugin idea is manipulates of compiled html by
nodbb-markdown-plugin
like many other plugin does.Even, I don't want to touch guts of markdown syntax or stored text, it can easily ruin things against future NodeBB updates. Hooking from
nodebb-markdown-plugin
is also good idea but with fully compiled html :). ( but then what is different fromfilter:parse.post
? )Wait, that's good point, I don't think it's so secure way but if we convert addresses of images before saving or while writing, then it will be quite efficient. But I'm not sure that it's really better way. Right now, I take my way
-
I actually think the idea is great.
However I wouldn't do it via custom filters but via method attaching like so:
- Attach a method
[String] getWithoutCode()
that caches the resulting value so it does not calculate twice. - Attach a method
setWithoutCode(String)
to apply changes to the cache. - Bind a listener to get of
postData
(or whatever) to parse and apply the cached value (cache a variable like[Boolean] anyChanges
to not parse when the content hasn't changed). - Bind a listener to set of
postData
(or whatever) to clean the cache.
This way the calculations would only happen when any plugin requires it and it would not happen more than once during a series of plugins that don't manipulate the original content.
I guess it would even be neater not to use get/set functions but put the logic in getter/setter ofdata.postDataWithoutCode
(or a better name).Same stuff could be added for various data that could be pre-processable (although setting
postDataWithoutCode
would require the cached list of tags to be updated, etc.).@qgp9 If you wanna give it a try take a look at
Object.defineProperty()
(link) for the javascript way of getter/setter definition.This would work until any plugin returns a new object instead of a modified version of the passed one thought. But I've not seen any plugin that does this and I guess it's negligible due to no valid reason to do so.
- Attach a method
-
@frissdiegurke said:
However I wouldn't do it via custom filters but via method attaching like so:
If I understood correctly, this means we need to manage two branches at the same time, and always merge those two branches after every method call or filter hook. otherwise result of
getWithoutCode
andpostData
will contain different version.
A Merging means re-parsing for next plugin(filter). This way could provide good interface and save lines of code but not resources. -
@frissdiegurke I think I misunderstood. I'm re-reading
-
@qgp9 It's not merging after every method call, but whenever the non-modified branch should be get.
So if
n
plugins in series work with theget/setWithoutCode
only and don't touchpostData
, thepostData
will not be recalculated between those plugins. This is highly likely to be due to the NodeBB hookspriority
which would be needed anyways to ensure it's running after html parsing. -
@frissdiegurke My original understood might be quite close
So if n plugins in series work with the get/setWithoutCode only and don't touch postData, the postData will not be recalculated between those plugins
I don't understand this part. Let's call this plugin as "pre-parser", and if we have 4 plugins C,D,F which call get/setWithoutCode, and E which use just full
raw data
- C will read cached contents(w/o code) by getWithoutCode.
- C will update cached contents by SetWithoutCode.
- D will read and update cached contents
- E will read
data
and edit or not return. - F will read cached contents.
So here problem is that.
At 1. When
setWithoutCode
is called,pre-parser
doesn't know D will use what.
At 2. WhensetWithoutCode
is called,pre-parser
doesn't know E will use what.
At 3. If D doesn't say 'updated' or 'nochanged" topre-parser
explicitly,pre-parser
need to compare raw data and cached data from 2) while next calling ofgetWithoutCode
because of no information. Of course I assume thatpre-praser
can catch whole list of plugin in a pipe and also catch which plugin is callingget/setWithoutCode
.- and 2) can be solved
pre-parser
requirepre-registraion
or similar to plugins which will use cached data.
but what about 3). If any plugin doesn't know aboutpre-parser
thenpre-parser
need to take care of it and so on, in the end, it will need resource again. and complicate
-
I guess your basic problem is: you don't know that javascript getter/setter can be defined (see my link to
Object.defineProperty()
).I'll do some schematic code here (not tested):
function prepareData(data) { let postData = data.postData; let cache = null; let hasChanged = false; Object.defineProperty(data, "postDataWithoutCode", { set: function (value) { cache = value; hasChanged = true; }, get: function () { if (cache == null) { cache = calculateCache(postData); } return cache; } }); Object.defineProperty(data, "postData", { set: function (value) { postData = value; cache = null; hasChanged = false; }, get: function () { if (hasChanged) { postData = calculatePostData(cache); hasChanged = false; } return postData; } }); return data; }
-
@frissdiegurke I need to read a document what you gave more carefully , but before, is that so different from an usual OOP object which has 3 data-members and and 4 get/set accessors?
-
This way it calculates/merges at most as often as it's get/set (thus in worst case no performance decrease).
In best case (and likely case due topriority
within hooks that'd be needed by those plugins) all plugins that know about the pre-parser are called in series thus it's only one calculation and one merge in total.Overall: For
m
series of plugins that only usepostDataWithoutCode
there arem
total calculations and merges.@qgp9 It's pretty much like a private attribute and two public ones with get/set accessors except it's not needed to call those accessors explicitly. And we overwrite the get/set accessors of the original
postData
to keep in mind. -
@frissdiegurke Sorry, I read the documents, I don't understand some points. I may be missing some key point/feature of javascript.
My question is simply that,
How does
pre-parser
deal with a plugin which doesn't call any function/method ofpre-parser
?In long, If plugin
E
knows nothing ofpre-parser
, Howpre-prarse
decide to merge cache to postdata whenD
calledsetWithoutCode
. Any other chance afterD
'sset
beforeE
take data in usual way? -
When
E
does anything to thepostData
, it meansE
calls at least once the getter ofpostData
; Thus the cache gets merged to the postData first (postData = calculatePostData(cache);
). SoE
gets all changes of previous plugins.When
E
does modify thepostData
the cache gets cleared to be re-generated whenD
getspostDataWithoutCode
.It's not about knowing which plugins comes next, but about providing up-to-date information when it is required.
We don't have to merge the cache back before the original data is required.
We don't have to calculate the cache before it's required. -
OK, now I understand. Those getter and setter doesn't mean literal 'getpostData' but they are invoked whenever postData is accessed and modified. right?
I was totally stupid while reading. I have to go home. too tired
I see, let me think more based on that. I understand your story now, but not sure yet that's really better way or not.
Thank you!
-
@qgp9 said:
OK, now I understand. Those getter and setter doesn't mean literal 'getpostData' but they are invoked whenever postData is accessed and modified. right?
Yes
I was totally stupid while reading. I have to go home. too tired
I see, let me think more based on that. I understand your story now, but not sure yet that's really better way or not.
If you find any cons let me know.
Thank you!
You're welcome
-
@frissdiegurke said:
If you find any cons let me know.
Pros
- Your method is very flexible. One can cooperate with other plugins just by just control of priorities. This is really un-replaceable advantage.
- Also I can imagine that we could do this without activated NodeBB plugin called
pre-parser
but just with helper module per each plugin. I'm not sure of update and version control with npm, but I believe there are certain solutions.
Cons
- If we want to provide "image list" or "link list" above "contents without code block", a situation will become too complicate. Without accurate and elaborate priorities configurations, we will loose an advantage in resources. In this situation, separated filtering pipe will be better.
- Your method actually will be touching - let's say -
core object
. I think this is what we have to be very careful. In the future when NodeBB core development team wants to useget/set
ofpostData
, it can be a big problem. Actually it may be possible we check if there are alreadyget/set
er and make newget/set
er which include old function, but depend on type of original functions, it's not easy to guarantee a proper working. I think a better way could be a pre-registration of plugins which usepre-parser
andpre-parser
access whole filtering pipe information and decide what to do in each step. An disadvantage of this is that afteruninformed plugin
before nextget
call of cached value,pre-parser
should check what is changed.
Now what I want to say here is that your method is really brilliant and has un-replaceable advantage but too deep and touching core ( maybe not practically yet but conceptually ). Therefore it needs to be consulted by a core development team.
In other point, your method is quite independent from my custom filter hooks and we can even take both advantages ( of course it will make another problem like Too much or Too complicate :), but just as an idea ).
-
So now, I have questions to the NodeBB development team. @julian
-
About @frissdiegurke 's idea.
- What do you think? Is it secure or do you have any better ideas, policies , plans?
-
To manage dependencies between plugin,
- Can we enforce to install and activate plugins by dependencies?
- Or can we give warnings of wrong configurations, dependencies and so on by plugin on admin UI ( dash board? )?
-
Do we have secure way to add/remove/modify custom data by plugin in a filtering pipe?
- For example, on filtering hook,
data.pluginData.<plugin name>
can be always for plugins (means some sure for core will not use it for different reason!! ), whether they are removed or not after filtering. I know if I select good name( random or highly uniq), then it will be quite safe but it will be better if we have an official space.
- For example, on filtering hook,
-
If I want to make a npm module which is dedicated to NodeBB while it's not a
plugin
, then what is good name for. Maybenodebb-helper-any-name
. It may be for individual plugins.
I had one more, but I forgot what was it while writing....
-