Re: [w3c/editing] Remove write and add more details on read. (PR #456)

@sanketj requested changes on this pull request.



>  
 
 ## Author:
 *   ansollan@microsoft.com
 *   snianu@microsoft.com
 
 ## Introduction
-Using DataTransfer object's setData and async clipboard write method, there are interop differences in how the HTML content is sanitized and written to the clipboard. It'd be beneficial for the web authors if async clipboard and setData APIs provide similar level of fidelity of HTML content during copy & paste operations so round tripping is possible without any interop differences such as losing formats, meta tags etc.
-If we use the built-in sanitizer that produces an HTML fragment, the styles that get inlined bloat the payload and [strip out the custom styles](https://drive.google.com/file/d/1Nsyp1rUKc_NF4l0n-O05snAKabHAKeiG/view) inserted by sites like Excel online that are used to preserve excel specific semantics.
+HTML content is essential for supporting copy/paste operation of high fidelity content from native apps to web sites and vice versa, especially in sites supporting document editing. DataTransfer object's `getData` and async clipboard `read()` methods have interop differences in how the HTML content is sanitized during a paste operation. The `getData` method returns unsanitized HTML content, but the `read()` method uses the browser's sanitizer to strip out content (ex. global `<style>`s, `<script>`s, `<meta>` tags) from the HTML markup which results in format loss, bloating of payload due to inlining of styles etc.

What does "etc." refer to here?

Can we also add an example for each problem that we are calling out? A bulleted list can be used if that provides better formatting.

>  
 
 ## Author:
 *   ansollan@microsoft.com
 *   snianu@microsoft.com
 
 ## Introduction
-Using DataTransfer object's setData and async clipboard write method, there are interop differences in how the HTML content is sanitized and written to the clipboard. It'd be beneficial for the web authors if async clipboard and setData APIs provide similar level of fidelity of HTML content during copy & paste operations so round tripping is possible without any interop differences such as losing formats, meta tags etc.
-If we use the built-in sanitizer that produces an HTML fragment, the styles that get inlined bloat the payload and [strip out the custom styles](https://drive.google.com/file/d/1Nsyp1rUKc_NF4l0n-O05snAKabHAKeiG/view) inserted by sites like Excel online that are used to preserve excel specific semantics.
+HTML content is essential for supporting copy/paste operation of high fidelity content from native apps to web sites and vice versa, especially in sites supporting document editing. DataTransfer object's `getData` and async clipboard `read()` methods have interop differences in how the HTML content is sanitized during a paste operation. The `getData` method returns unsanitized HTML content, but the `read()` method uses the browser's sanitizer to strip out content (ex. global `<style>`s, `<script>`s, `<meta>` tags) from the HTML markup which results in format loss, bloating of payload due to inlining of styles etc.
+
+Currently sites that are using the DataTransfer object's `getData` method to read unsanitized HTML content, do not want to regress HTML paste operation by migrating to async clipboard `read()` method. It'd be beneficial for the web authors if async clipboard `read()` method and `getData` methods provide similar level of fidelity of HTML content during paste operations. This would also allow browsers that write unsanitized HTML content into the clipboard to roundtrip HTML content better.

```suggestion
These problems mean that web developers may not get the same HTML paste quality and performance with the async clipboard `read` API as they do with the DataTransfer object's `getData` method. This proposal aims to solve these problems so that the `read` can work just as well as `getData` when pasting HTML content.
```

>  
 
 ## Author:
 *   ansollan@microsoft.com
 *   snianu@microsoft.com
 
 ## Introduction
-Using DataTransfer object's setData and async clipboard write method, there are interop differences in how the HTML content is sanitized and written to the clipboard. It'd be beneficial for the web authors if async clipboard and setData APIs provide similar level of fidelity of HTML content during copy & paste operations so round tripping is possible without any interop differences such as losing formats, meta tags etc.
-If we use the built-in sanitizer that produces an HTML fragment, the styles that get inlined bloat the payload and [strip out the custom styles](https://drive.google.com/file/d/1Nsyp1rUKc_NF4l0n-O05snAKabHAKeiG/view) inserted by sites like Excel online that are used to preserve excel specific semantics.
+HTML content is essential for supporting copy/paste operation of high fidelity content from native apps to web sites and vice versa, especially in sites supporting document editing. DataTransfer object's `getData` and async clipboard `read()` methods have interop differences in how the HTML content is sanitized during a paste operation. The `getData` method returns unsanitized HTML content, but the `read()` method uses the browser's sanitizer to strip out content (ex. global `<style>`s, `<script>`s, `<meta>` tags) from the HTML markup which results in format loss, bloating of payload due to inlining of styles etc.
+
+Currently sites that are using the DataTransfer object's `getData` method to read unsanitized HTML content, do not want to regress HTML paste operation by migrating to async clipboard `read()` method. It'd be beneficial for the web authors if async clipboard `read()` method and `getData` methods provide similar level of fidelity of HTML content during paste operations. This would also allow browsers that write unsanitized HTML content into the clipboard to roundtrip HTML content better.

For:
"This would also allow browsers that write unsanitized HTML content into the clipboard to roundtrip HTML content better."
Should we list this in the goals section below instead?

>  
 
 ## Author:
 *   ansollan@microsoft.com
 *   snianu@microsoft.com
 
 ## Introduction
-Using DataTransfer object's setData and async clipboard write method, there are interop differences in how the HTML content is sanitized and written to the clipboard. It'd be beneficial for the web authors if async clipboard and setData APIs provide similar level of fidelity of HTML content during copy & paste operations so round tripping is possible without any interop differences such as losing formats, meta tags etc.
-If we use the built-in sanitizer that produces an HTML fragment, the styles that get inlined bloat the payload and [strip out the custom styles](https://drive.google.com/file/d/1Nsyp1rUKc_NF4l0n-O05snAKabHAKeiG/view) inserted by sites like Excel online that are used to preserve excel specific semantics.
+HTML content is essential for supporting copy/paste operation of high fidelity content from native apps to web sites and vice versa, especially in sites supporting document editing. DataTransfer object's `getData` and async clipboard `read()` methods have interop differences in how the HTML content is sanitized during a paste operation. The `getData` method returns unsanitized HTML content, but the `read()` method uses the browser's sanitizer to strip out content (ex. global `<style>`s, `<script>`s, `<meta>` tags) from the HTML markup which results in format loss, bloating of payload due to inlining of styles etc.
+
+Currently sites that are using the DataTransfer object's `getData` method to read unsanitized HTML content, do not want to regress HTML paste operation by migrating to async clipboard `read()` method. It'd be beneficial for the web authors if async clipboard `read()` method and `getData` methods provide similar level of fidelity of HTML content during paste operations. This would also allow browsers that write unsanitized HTML content into the clipboard to roundtrip HTML content better.
+
+Web custom formats can be used to exchange unsanitized HTML if both source and target apps have support for it, but there are many native apps that don't have support for web custom formats, so contents copied from these apps in the HTML format would have to go through the Browser sanitizer in `read()` that would result in loss of fidelity.

It might be better to move this into an alternatives considered section. Thoughts?

>  
 ## Proposal
 
-With this new proposal, we will be introducing a new `unsanitized` parameter in the [read()](https://w3c.github.io/clipboard-apis/#dom-clipboard-read) method so the content is round trippable i.e. `read()` would return the content without any sanitization. On [write](https://w3c.github.io/clipboard-apis/#dom-clipboard-write) method call, we will always write a well-formed HTML document if `text/html` is provided in the [ClipboardItem](https://w3c.github.io/clipboard-apis/#clipboard-item-interface).
+With this new proposal, we will be introducing a new `unsanitized` parameter in the [read()](https://w3c.github.io/clipboard-apis/#dom-clipboard-read) method so the HTML content can be read without any loss of information i.e. `read()` would return the content without any sanitization.

Should we update `read()` to pass the unsanitized parameter?

>  
 For more details see the [security-privacy](https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/ClipboardAPI/tag-security-privacy-clipboard-unsanitized-read.md) doc.
 
+[Here](https://docs.google.com/document/d/1QLt50Q8UnlQksVltZ2PNkDZVdk9N58Pq7P0lzGTKh-U/edit?usp=sharing) is a threat model document for this feature.

This document was made for Chromium T&S reviews. I don't think we should include a link in the explainer without sanitization (pun intended :) ). I don't think it is strictly necessary here given that you have described the security considerations elsewhere. So I would actually suggest just leaving it out.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3c/editing/pull/456#pullrequestreview-1705475521
You are receiving this because you are subscribed to this thread.

Message ID: <w3c/editing/pull/456/review/1705475521@github.com>

Received on Tuesday, 31 October 2023 05:54:51 UTC