Re: [w3c/editing] Remove write and add more details on read. (PR #456) from Sanket Joshi on 2023-10-27 (public-webapps-github@w3.org from October 2023)

From: Sanket Joshi <notifications@github.com>
Date: Fri, 27 Oct 2023 12:20:56 -0700
To: w3c/editing <editing@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3c/editing/pull/456/review/1702390451@github.com>

@sanketj requested changes on this pull request.

While I like that the explainer now focuses on read, which is the primary API change, it also feels important to talk about write. Copy/paste roundtripping within the browser (or hybrid apps) is an important use case as well, and that is not possible if sanitization happens on write. How should we document this?

## Author:
* ansollan@microsoft.com
* snianu@microsoft.com

## Introduction
-Using DataTransfer object's setData and async clipboard write method, there are interop differences in how the HTML content is sanitized and written to the clipboard. It'd be beneficial for the web authors if async clipboard and setData APIs provide similar level of fidelity of HTML content during copy & paste operations so round tripping is possible without any interop differences such as losing formats, meta tags etc.
-If we use the built-in sanitizer that produces an HTML fragment, the styles that get inlined bloat the payload and [strip out the custom styles](https://drive.google.com/file/d/1Nsyp1rUKc_NF4l0n-O05snAKabHAKeiG/view) inserted by sites like Excel online that are used to preserve excel specific semantics.
+DataTransfer object's `getData` and async clipboard `read()` methods have interop differences in how the HTML content is sanitized during paste operation. `getData` method returns unsanitized HTML content, but `read()` method uses the Browser sanitizer to strip out content from the HTML markup. It'd be beneficial for the web authors if async clipboard `read()` method and `getData` methods provide similar level of fidelity of HTML content during paste operations so web apps can read the HTML content written by the native apps without any interop differences such as losing formats, meta tags etc.

```suggestion
DataTransfer object's `getData` and async clipboard `read()` methods have interop differences in how the HTML content is sanitized during a paste operation. The `getData` method returns unsanitized HTML content, but the `read()` method uses the browser's sanitizer to strip out content (ex. global styles, scripts, meta tags) from the HTML markup.
```

## Author:
* ansollan@microsoft.com
* snianu@microsoft.com

Instead of starting with "it will beneficial for the web authors if...", it would be good to start with what the problem is for web developers.

>
## Goals
-* Preserve fidelity of the HTML format just like the legacy DataTransfer API used to read/write HTML format.
+* Preserve fidelity of the HTML format just like the legacy DataTransfer API.

Should we list parity with the DataTransfer API as a separate goal. I think preserving copy/paste fidelity deserves to be a goal on its own.

> @@ -21,40 +22,52 @@ If we use the built-in sanitizer that produces an HTML fragment, the styles that
* Drag-and-Drop APIs.

## Additional Background
-HTML content is essential for supporting copy/paste operation of high fidelity content from native apps to web sites and vice versa, especially in sites supporting document editing. Currently it is being supported by three APIs:
+HTML content is essential for supporting copy/paste operation of high fidelity content from native apps to web sites and vice versa, especially in sites supporting document editing. Web custom formats can be used to exchange unsanitized HTML, but there are many native apps that don't have support for web custom formats, so contents copied from these apps in the HTML format would have to go through the Browser sanitizer in `read()`. This makes the `read()` method less useful as the Browser sanitizer strips out content from the HTML markup which results in format loss, bloating of payload due to inlining of style etc. Currently sites are using the DataTransfer object's `getData` method to read unsanitized HTML content, so sites do not want to regress HTML paste operation by migrating to async clipboard `read()` method.

Consider moving the line about web custom formats into a separate paragraph. It is good to explain that, but it also detracts a bit from the main issue with lack of sanitization for the native HTML format.

--
Reply to this email directly or view it on GitHub:
https://github.com/w3c/editing/pull/456#pullrequestreview-1702390451
You are receiving this because you are subscribed to this thread.

Message ID: <w3c/editing/pull/456/review/1702390451@github.com>

Received on Friday, 27 October 2023 19:21:02 UTC