Re: [w3c/clipboard-apis] Make async clipboard APIs (read/write) to sanitize interoperably with setData/getData for text/html (#150)

> @snianu thanks for tackling this! Could you detail step 1 a bit more?

Sure. Step 1 basically creates the [document ](https://html.spec.whatwg.org/multipage/dom.html#document)object from the HTML string provided by the web authors. It could just be a [document fragment](https://dom.spec.whatwg.org/#documentfragment) or a full HTML [document ](https://html.spec.whatwg.org/multipage/dom.html#document). We use the [DOMParser](https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#domparser) to create a well formed HTML document from the input, and then use the sanitizer API to strip out harmful contents from the markup. After this step, we insert platform specific header info into the serialized document and then write it to the clipboard.

Here is a more detailed proposal for the HTML sanitization:
1. First we [create a DOMParser object](https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-domparser-constructor) that can be used to parse the string provided by the web authors and get a [document ](https://html.spec.whatwg.org/multipage/dom.html#document)object. We use the algorithm defined in [parseFromString](https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-domparser-parsefromstring) for [`text/html`](https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-domparsersupportedtype-texthtml) type to parse the html string.
2. Then, we want to use the Sanitizer API to strip out harmful contents such as `script` tags. Currently we don't have a way to parse and get a full HTML [document ](https://html.spec.whatwg.org/multipage/dom.html#document) using the [sanitize ](https://wicg.github.io/sanitizer-api/#dom-sanitizer-sanitize)method, but we want to do something like how [sanitizeFor](https://wicg.github.io/sanitizer-api/#dom-sanitizer-sanitizefor) would parse an HTML document element using the `html` as `element` & `doc.documentElement.innerHTML` (where doc is the `document` from step 1) as the `input`
3. In the last step, we want to create HTML platform specific header and add it to the markup string which is the HTML `document` from step 2 serialized into string. Examples of platform specific header as given below:
On Windows, we have:
```
Version:0.9
StartHTML:<start offset of the start html tag>
EndHTML:<start offset of the end html tag>
StartFragment:<start offset of the start fragment comment tag>
EndFragment:<start offset of the end fragment comment tag>
<html>
<body>
<!--StartFragment-->
<body content goes here>
<!--EndFragment-->
</body>
</html>
```
On Linux we have:
```
<meta http-equiv="content-type" content="text/html; charset=utf-8">
```
etc...
Tagging @mkruisselbrink @a-sully if this proposal looks good to them.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/clipboard-apis/issues/150#issuecomment-915692288

Received on Thursday, 9 September 2021 01:41:11 UTC