Re: paste events and HTML support - interest in exposing a DOM tree? from Hallvord R. M. Steen on 2011-05-17 (public-webapps@w3.org from April to June 2011)

From: Hallvord R. M. Steen <hallvord@opera.com>
Date: Tue, 17 May 2011 13:11:03 +0900
To: Johan Sörlin <spocke@moxiecode.com>
Cc: public-webapps@w3.org, "Frederico Caldeira Knabben" <f.knabben@cksource.com>
Message-ID: <op.vvlxwpqga3v5gv@hr-opera.oslo.opera.com>

On Mon, 09 May 2011 21:02:21 +0900, Johan Sörlin <spocke@moxiecode.com>  
wrote:

> Hi Hallvord,
>
> This is wonderful news since getting the html from the clipboard right  
> now is a really ugly hack and very browser dependent.

Sure it is, we hope to fix that in a nice way :-)

> Getting the clipboard data as both a string or a fragment would sure  
> make it easy for developers to handle clipboard contents.

I take that as support for my event.clipboardData.getDocumentFragment()  
suggestion. I'll need to talk to Ian Hickson about it since the actual  
DataTransfer definition is in HTML5, but I guess it might be interesting  
also for DnD?

> Regarding HTML sanitation:
> The mozilla folks recently decided to clean up pasted HTML but it's a  
> bit too aggressive removing all non standard attributes. In order to for  
> example detect MS word HTML a lot of this odd content needs to be  
> retained to check for list like structures for example.

Yes, the getData() algorithm is not going to clean up non-standard  
attributes/class names/elements.

> I think the sanitation outlined in the document might also be a bit too  
> aggressive. Such as removing the HTML comments and data attributes. For  
> example both TinyMCE and CKEditor uses the data- attributes for internal  
> usage. So if a user is pasting from one editor to another cross domains  
> the attribute would be lost and therefor break that item.

I didn't know that you use data- attributes. This gets somewhat tricky -  
data- is certainly a type of hidden data that should be removed under the  
"to the greatest extent possible, a user should know what s/he is really  
pasting" principle. I see your use case, but I also assume that many sites  
would use data- attributes for information that the site doesn't expect  
anybody but its own JS to get access to.

> Also removing all style properties that is computed to 'none' would  
> remove browser specific CSS rules and mso- styles that we use to detect  
> word specific items.

What he algorithm actually intends is to remove all elements with  
display:none (and visibility:hidden) - again based on the principle that  
we should try to make sure the user knows what s/he is pasting.

I've reworked the stuff on pasting HTML (potentially with multiple parts)  
today - please review this section at your leisure, particularly the  
screenful known as sections 8.3 and 8.3.1:
http://dev.w3.org/2006/webapi/clipops/clipops.html#pasting-html

-- 
Hallvord R. M. Steen, Core Tester, Opera Software
http://www.opera.com http://my.opera.com/hallvors/

Received on Tuesday, 17 May 2011 04:11:46 UTC