[clipboard] Dilemma: getData('text/html') and useful CF_HTML quirks

We're exploring text/html paste behaviours in Mozilla bug 586587 [1] and
running into some tricky questions I'd like to discuss here.

Basically, on Windows IE and other apps that write HTML to the clipboard
use the CF_HTML format. This format is simply described as

> headers (name:value meta data)
>
> <html><head></head>
> <body>
> <!--StartFragment-->HTML<!--EndFragment-->
> </body>
> </html>

where the StartFragment / EndFragment comment tags are inserted by
implementations writing HTML to the clipboard to show where the actually
selected content starts and ends. Several very common implementations
(including I believe Microsoft Word's) will add tags like STYLE outside of
the StartFragment/EndFragment tags and add rules that may be significant
for rendering the content of the fragment correctly. Also noteworthy is
that the meta data may include a SourceURL property showing the URL of the
page you copied from.

So, because of the significance of the STYLE information and other stuff
outside Start/EndFragment, certain browsers return the full document
including the Start/EndFragment comment tags when a script does
getData('text/html'). This is obviously very useful when there's important
stuff outside these tags. It still means scripts have to do extra work to
find those comments and extract the content inside them to know what data a
user actually intended to paste. This also adds a risk that scripts will be
tested only on Windows and authored to require those comments and fail if
they aren't there on other platforms.

Should we, then, standardise returning the full document including
Start/EndFragment comments (basically requiring or encouraging other
platform implementations to start using those comments when serializing
HTML for the OS clipboard) - or should getData() return only what's inside
the Start/EndFragment tags? Are any other important platforms already using
CF_HTML conventions, or would their developers balk at being encouraged to
do so?

On a related topic, I see SourceURL as useful (could be used to properly
attribute citations automatically and such) - it would be nice to
standardise DataTransfer.sourceURL or something like that, to be set when
available.
-Hallvord
(editor of https://w3c.github.io/clipboard-apis/ )
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=586587

Received on Thursday, 23 April 2015 08:13:57 UTC