Re: [w3c/clipboard-apis] Make async clipboard APIs (read/write) to sanitize interoperably with setData/getData for text/html (#150)

Thanks for sharing.  Some initial reactions below for you to digest in advance of our next meetup.

> As discussed in the TF meeting, when sanitizing markup, WebKit loads the markup in a separate (offscreen) page and browsing context, and then serializes the loaded result into markup. This page that we use for sanitization is special, in that we forbid any script execution, but still allow script tags to be parsed as if they were going to be executed. This discrepancy is necessary in order to ensure that the page cannot craft a payload that is deemed "safe" when loaded in a browser that disables script, but is unsafe when loaded in a browser that enables script. I'm not sure this behavior is something that can or should be specified.

What is the result of doing that?  I may be missing something subtle about what you mean by "as if they were going to be executed".  Are there any special characteristics that would make that document different than one produced by `new DOMParser().parseFromString(htmlToSanitize, "text/html")`?

> For compatibility with older versions of Microsoft Office, we may preserve attributes on the html element in a narrow case where it contains the text xmlns:o="urn:schemas-microsoft-com:office:office". Would the specification allow for user agents to selectively preserve content like this?

I suppose we could allow a decision like that to be user agent specific, but my preference would be for us to discuss what the [Sanitizer API](https://wicg.github.io/sanitizer-api) allows or doesn't that we would consider harmful and then address those cases so that the attribute list can be part of the spec.

> The process of serializing "visible content" in the page we use for sanitization is also pretty difficult to (exactly) specify, since we rely on editing code in WebKit that determines which DOM positions are "visible" to the user (and, importantly, visually distinct from other such DOM positions) to figure out the range in the sanitized page that we should include in the final sanitized markup. For instance, if we're sanitizing <div><div>Hello</div></div>, we won't attempt to preserve the fact that there are nested div elements, since the first user-visible position is right before the "H" in the inner text node.

You mentioned in our last working group meeting that Safari effectively does a "Select All" operation on the offscreen document and serializes the resulting range... did I get that right?  I agree that editing heuristics relating to normalized selection positions would be hard to specify without some other foundational work coming first.  My preference would be to understand what threat is being mitigated and see if we can propose a simpler step that could still mitigate the same threat.  

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/clipboard-apis/issues/150#issuecomment-922170510

Received on Saturday, 18 September 2021 03:22:21 UTC