Re: [w3c/editing] Remove write and add more details on read. (PR #456)

@sanketj requested changes on this pull request.



>  
 
 ## Author:
 *   ansollan@microsoft.com
 *   snianu@microsoft.com
 
 ## Introduction
-Using DataTransfer object's setData and async clipboard write method, there are interop differences in how the HTML content is sanitized and written to the clipboard. It'd be beneficial for the web authors if async clipboard and setData APIs provide similar level of fidelity of HTML content during copy & paste operations so round tripping is possible without any interop differences such as losing formats, meta tags etc.
-If we use the built-in sanitizer that produces an HTML fragment, the styles that get inlined bloat the payload and [strip out the custom styles](https://drive.google.com/file/d/1Nsyp1rUKc_NF4l0n-O05snAKabHAKeiG/view) inserted by sites like Excel online that are used to preserve excel specific semantics.
+DataTransfer object's `getData` and async clipboard `read()` methods have interop differences in how the HTML content is sanitized during a paste operation. The `getData` method returns unsanitized HTML content, but the `read()` method uses the browser's sanitizer to strip out content (ex. global styles, scripts, meta tags) from the HTML markup.
+
+If we use the built-in sanitizer that produces an HTML fragment, the styles that get inlined and bloat the payload and [strip out the custom styles](https://drive.google.com/file/d/1Nsyp1rUKc_NF4l0n-O05snAKabHAKeiG/view) inserted by sites like Excel online that are used to preserve excel specific semantics. It'd be beneficial for the web authors if async clipboard `read()` method and `getData` methods provide similar level of fidelity of HTML content during paste operations.

When introducing the problem, it is better to talk about the general issue, not the specific customer that is impacted. We can mention Excel Online (and others) in a customers/web developers section.

>  
 ## Goals
-*   Preserve fidelity of the HTML format just like the legacy DataTransfer API used to read/write HTML format.
+*   Preserve fidelity of the HTML format.

```suggestion
*   Preserve copy/paste fidelity when reading/writing the HTML format on the clipboard.
```

> @@ -21,40 +23,54 @@ If we use the built-in sanitizer that produces an HTML fragment, the styles that
 *   Drag-and-Drop APIs.
 
 ## Additional Background

This sections sounds somewhat redundant with the introduction. Can they be merged?

>  
-### DataTransfer object's setData
-DataTransfer object can be accessed via the copy/paste event handler. It can then be used to set the clipboard data and preventDefault the browser's default copy operation. That way authors have some control over the HTML content that they want the browser to write to the native clipboard. E.g.
+### DataTransfer object's getData
+DataTransfer object can be accessed via the paste event handler. It can then be used to get the clipboard data and preventDefault the browser's default paste operation. That way authors can read the unsanitized HTML content and process the HTML markup in their document model during paste. E.g.

```suggestion
The `DataTransfer` object can be accessed via the paste event handler and `getData` can be used to get the clipboard data in a specific format. Authors can call `preventDefault` to prevent the browser's default paste action and create their own app-specific paste implementation. The `getData` API does not perform sanitization and always returns unsanitized HTML to the caller.
```

>  ```
 
 ### Copy/paste execCommand
 `execCommand` is used to invoke the copy/paste command which uses the browser's default logic to read/write the clipboard content.
 
-### Async HTML read/write APIs
-This API is called via navigator.clipboard object and is used to read/write HTML to the clipboard asynchronously without depending on clipboard event or execCommand implementation. This provides more flexibility to the web authors in terms of the type of the HTML content and when the data needs to be read/written to the clipboard. E.g.
+```js
+pasteExecCommandBtn.addEventListener("click", function(e) {
+  var pasteTarget = document.createElement("textarea");
+  pasteTarget.contentEditable = true;
+  document.body.appendChild(pasteTarget);
+  pasteTarget.focus();
+  const result = document.execCommand("paste");
+});
+
+```
+
+### Async HTML read APIs
+This API is called via `navigator.clipboard` object and is used to read HTML to the clipboard asynchronously without depending on clipboard event or execCommand implementation. This provides more flexibility to the web authors as it doesn't need a synchronous event to access clipboard. E.g.

```suggestion
This API is called via the `navigator.clipboard` object and is used to read HTML to the clipboard asynchronously without listening for a clipboard event or calling `execCommand`. This provides more flexibility and better performance to web authors than the other APIs.
```

>  ```
-Using any of the above mentioned APIs, web authors should be able to round trip HTML content and also be compatible with other browsers.
+Using any of the above mentioned APIs, web authors should be able to read same fidelity of HTML content.

```suggestion
All of the above-mentioned APIs should allow web authors to read HTML content with equally high fidelity.
```

>  
-## Copy HTML text using setData
+## Paste HTML text using getData

"HTML text" is a bit confusing. Consider the following:

```suggestion
## Paste HTML content using getData
```

> @@ -88,61 +104,28 @@ EndFragment:00000463
 In standard html format, Safari inserts both sanitized & unsanitized version of html content. It inserts the html content provided in the setData API into the clipboard using a custom webkit format type(`com.apple.Webkit.custom-pasteboard-data`). When `getData` is called, the HTML content in the custom webkit format type is returned (makes round tripping possible).

Does Safari always return data from `com.apple.Webkit.custom-pasteboard-data` when HTML is pasted? My understanding is that they only do that for same-origin sites.

> @@ -88,61 +104,28 @@ EndFragment:00000463
 In standard html format, Safari inserts both sanitized & unsanitized version of html content. It inserts the html content provided in the setData API into the clipboard using a custom webkit format type(`com.apple.Webkit.custom-pasteboard-data`). When `getData` is called, the HTML content in the custom webkit format type is returned (makes round tripping possible).
 
 ### In Chromium & FF:
-During `setData` call, the HTML string is written without sanitization i.e. we don't remove tags such as `<meta>, <script>, <style>` etc from the HTML markup provided in the `setData`.
-In Chromium, the header of the HTML is hardcoded([`ui::clipboard_util::HtmlToCFHtml`](https://source.chromium.org/chromium/chromium/src/+/main:ui/base/clipboard/clipboard_util_win.cc;drc=9cc9ba08c27cb1172fb4a876ceb432f72bebfe72;l=845)) and then written to the clipboard.
+During `getData` call, the HTML string is read without sanitization i.e. we don't remove tags such as `<meta>, <script>, <style>` etc from the HTML markup provided in the `getData`.

```suggestion
When `getData` is called, the HTML string is read without sanitization i.e. global styles, script tags, meta tags are not removed from the markup.
```

>  
-## Copy HTML text using async clipboard write
-```
-Version:0.9
-StartHTML:0000000105
-EndHTML:0000000252
-StartFragment:0000000141
-EndFragment:0000000216
-<html>
-<body>
-<!--StartFragment--><p style="color: red; font-style: oblique;">This text was copied </p><!--EndFragment-->
-</body>
-</html>
+In Chromium, the header of the HTML is hardcoded([`ui::clipboard_util::HtmlToCFHtml`](https://source.chromium.org/chromium/chromium/src/+/main:ui/base/clipboard/clipboard_util_win.cc;drc=9cc9ba08c27cb1172fb4a876ceb432f72bebfe72;l=845)) during copy and then written to the clipboard.

Is this relevant for paste?

>  ```
-Version:0.9
-StartHTML:0000000170
-EndHTML:0000000770
-StartFragment:0000000206
-EndFragment:0000000734
-SourceURL:file:///C:/Users/snianu.REDMOND/Downloads/index0.html
-<html>
-<body>
-<!--StartFragment--><span style="color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Some text</span><!--EndFragment-->
-</body>
-</html>
+Async clipboard `read()` method uses sanitizers to strip out content such as `<meta>, <style>, <script>` tags etc  from the HTML. This creates issues as it's not interop with DataTransfer's `getData` API so web authors that use `getData` (to read HTML) and async clipboard api (to read other formats) don't get the same content compared to using just the async clipboard read APIs for both HTML and other formats.

I don't understand this statement. Are we suggesting that using `getData` along with the async clipboard API is important for authors? I think the main point we want to get across is that the read method only allows reading of sanitized content today.

>  ```
-Async clipboard writer API uses sanitizers to strip out content such as `<meta>, <style>, <script>` tags etc  from the HTML. This creates issues as it's not interop with DataTransfer's `setData` API so web authors that use `setData` (to write HTML) and async clipboard api (to write other formats) don't get the same content compared to using just the async clipboard write APIs for both HTML and other formats.
+<p style="color: red; font-style: oblique;">This text was copied </p>

Is this example supposed to be deleted?

>  
 ```
 
+Here the clipboard content is sanitized and tags such as `<meta>, <script>, <style>` etc are not included while pasting contents from the clipboard.

A question that might come up from reading this is whether we should also update execCommand to allow unsanitized content. Should we address this as an open question?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3c/editing/pull/456#pullrequestreview-1704784024
You are receiving this because you are subscribed to this thread.

Message ID: <w3c/editing/pull/456/review/1704784024@github.com>

Received on Monday, 30 October 2023 18:50:06 UTC