Re: [clipboard] Dilemma: getData('text/html') and useful CF_HTML quirks

On Thu, Apr 23, 2015 at 1:16 AM Hallvord Reiar Michaelsen Steen <
hsteen@mozilla.com> wrote:

> We're exploring text/html paste behaviours in Mozilla bug 586587 [1] and
> running into some tricky questions I'd like to discuss here.
>
> Basically, on Windows IE and other apps that write HTML to the clipboard
> use the CF_HTML format. This format is simply described as
>
> > headers (name:value meta data)
> >
> > <html><head></head>
> > <body>
> > <!--StartFragment-->HTML<!--EndFragment-->
> > </body>
> > </html>
>
> where the StartFragment / EndFragment comment tags are inserted by
> implementations writing HTML to the clipboard to show where the actually
> selected content starts and ends. Several very common implementations
> (including I believe Microsoft Word's) will add tags like STYLE outside of
> the StartFragment/EndFragment tags and add rules that may be significant
> for rendering the content of the fragment correctly. Also noteworthy is
> that the meta data may include a SourceURL property showing the URL of the
> page you copied from.
>
> So, because of the significance of the STYLE information and other stuff
> outside Start/EndFragment, certain browsers return the full document
> including the Start/EndFragment comment tags when a script does
> getData('text/html'). This is obviously very useful when there's important
> stuff outside these tags. It still means scripts have to do extra work to
> find those comments and extract the content inside them to know what data a
> user actually intended to paste. This also adds a risk that scripts will be
> tested only on Windows and authored to require those comments and fail if
> they aren't there on other platforms.
>
>
Chrome's behavior is to return the literal HTML data, but without the
metadata header when a page calls getData('text/html'). However, if Chrome
is executing the default action of paste, we attempt to parse out the
fragment and only paste the fragment (however, we incorrectly don't include
styles).


> Should we, then, standardise returning the full document including
> Start/EndFragment comments (basically requiring or encouraging other
> platform implementations to start using those comments when serializing
> HTML for the OS clipboard) - or should getData() return only what's inside
> the Start/EndFragment tags? Are any other important platforms already using
> CF_HTML conventions, or would their developers balk at being encouraged to
> do so?
>

CF_HTML is not a format that any other app on any other platform would be
expecting, so you wouldn't be able to just start writing it to the
clipboard on Mac/Linux in place of the original HTML. So there's a bit of a
chicken and egg problem here.

I also can't say I love the CF_HTML format: the markup is a lot easier to
work with when the styles are inlined, etc. Plus pasting <style> blocks
means there might be collisions in style rules, etc.


>
> On a related topic, I see SourceURL as useful (could be used to properly
> attribute citations automatically and such) - it would be nice to
> standardise DataTransfer.sourceURL or something like that, to be set when
> available.
> -Hallvord
> (editor of https://w3c.github.io/clipboard-apis/ )
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=586587
>

You'd have to get all UAs to agree on a data property to use to transfer
this since I don't think using CF_HTML on other platforms is currently
workable.

Received on Thursday, 23 April 2015 18:34:57 UTC