How hard is it to process untrusted SVG data to strip out any potentially harmful tags or attributes (like stuff that might execute JavaScript)?

Simon Willison

I feel like this is well trodden ground for HTML these days, are there robust solutions for the SVG version of this problem?

Simon Willison

I'm wondering if I can give untrusted authors the ability to go wild with custom SVG in a framed-off fixed size area of a web page, without breaching the security of the wider page or application

Simon Willison

This is great! This Cloudflare Rust library includes a detailed test suite that tells me everything I wanted to know https://mastodon.theorangeone.net/@jake/113370469717181352

Marco Rogers

@simon sounds like you want a sandboxed iframe?

Terence Eden

@simon the JS in an SVG cannot interact with anything outside of itself.
So while an SVG can do all sorts of crazy things, it can't escape its sandbox.

Simon Willison

@polotek yeah probably! I'm still trying to work up my confidence in those, detailed and comprehensive documentation on exactly what the sandbox attribute does has been hard to come by

Simon Willison

Even better... it looks like I can point to them from a regular img tag and the SVG spec has me covered: https://www.w3.org/TR/SVG2/conform.html#secure-static-mode

zellyn (@[email protected])

@[email protected] I *believe* if you use an svg as the `src` of an image, it turns off all the javascript/onclick handlers, etc. Of course, you might *want* some of the javascript.

Hachyderm.io (hachyderm.io)

João S. O. Bueno

@simon based on the exoerience of people who tried to create a Python sandbox over the decades, I'd say it is pretty much impossible. (save for a browser saparayed as another page box: i.e. a "Frame")

Simon Willison

... and it looks like that means I can do an img tag with an src that points to a base64 encoded SVG object and any nasty JavaScript etc will be disabled for me - here's an example which seems to demonstrate that working https://gistpreview.github.io/?03f0076446027b9b12e1ea14315db52b

Simon Willison

@gwidion I think JavaScript sandboxes are a whole lot easier than Python, because browsers are already the most widely-deployed sandboxes in the world

João S. O. Bueno

@simon i agree that a "document" in a tab or a frame is a good sandbox. But I doubt very much one can achieve slfurther segregation within a document. there are way too many ways of linking back to javascript from html or svg tags, for example. And JS, on its side, has no segregation or protection whatsoever: one is free to manipulate all the DOM and beyond.

Simon Willison

@gwidion it looks to me like https://claude.ai has a robust solution to this, using a combination of iframes with the sandbox attribute and CSP headers, plus web workers with CSP headers and careful application of postMessage

I'm still trying to reverse engineer how their solutions work though

Jake Archibald

@simon <iframe sandbox> is useful here. You can even allow JavaScript but have it run in an opaque origin.

Simon Willison

@jaffathecake I'm desperately keen on learning the true ins and outs of that, but I've found detailed documentation (including browser support) on all of the options you can stuff in that sandbox attribute frustratingly difficult to locate

Ben Lings

@simon something to check if you do this: users can right click on the image and open them in a new tab. If they do this, scripts will then run. Check that the URL doesn’t share an origin with your site. I know that blob: URLs do…

Jake Archibald

@simon the table at the bottom of https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe is decent

Simon Willison

@jaffathecake it's the best I've seen but it still leaves me with so many questions... how good is browser support for each of those allowX things? What do browser security experts advise in terms of using them?

I'm really paranoid

Jake Archibald

@simon the browser support for the various allow features is in the table at the end of the page

Simon Willison

@jaffathecake wow I missed that! Thank you, this helps a LOT

Simon Willison

@ben_lings that's a good call - I checked and as far as I can tell the base64 URL when opened in a new page has no relationship at all to the page it was originally hosted