Re: Concerns regarding cross-origin copy/paste security from Hallvord R. M. Steen on 2011-05-17 (public-webapps@w3.org from April to June 2011)

From: Hallvord R. M. Steen <hallvord@opera.com>
Date: Tue, 17 May 2011 12:41:30 +0900
To: public-webapps <public-webapps@w3.org>, "Daniel Cheng" <dcheng@chromium.org>
Message-ID: <op.vvlwjgaqa3v5gv@hr-opera.oslo.opera.com>
On Thu, 05 May 2011 06:46:55 +0900, Daniel Cheng <dcheng@chromium.org>  
wrote:

> There was a recent discussion involving directly exposing the HTML  
> fragment
> in a paste to a page, since we're doing the parsing anyway for security
> reasons. I have some concerns regarding
> http://www.w3.org/TR/clipboard-apis/#cross-origin-copy-paste-of-source-code
> though.
>
>> From my understanding, we are trying to protect against [1] hidden data
> being copied without a user's knowledge and [2] XSS via pasting hostile
> HTML. In my opinion, the algorithm as written is either going to remove  
> too
> much information or not enough. If it removes too much, the HTML paste is
> effectively useless to a client app. If it doesn't remove enough, then  
> the
> client app is going to have to sanitize the HTML itself anyway.

FWIW, my main concern was the hidden data aspect because it can be abused  
for cross-site request forgery if a malicious site by getting the user to  
copy and paste gets access to form anti-CSRF tokens and such. I *intend*  
to leave some processing of the HTML to the client application, for  
example the removal of third-party application-specific or  
browser-specific CSS properties.

I see that Chrome applies different security policies depending on whether  
the content is read by a JavaScript (getData('text/html') - style) and  
inserted directly. You do some extra work to avoid XSS, such as removing  
on* event listener attributes and href=javascript: when content is  
inserted directly (you also remove some browser-specific elements and  
class names). This sort of clean up and processing on direct data  
insertion by the user-agent is not really in scope for the events spec IMO.

However, for getData('text/html') it seems you do no clean-up at all, not  
for cross-origin paste either. Implementing the current spec would thus  
require that you tighten your existing security policy. Will you consider  
doing so, or would you rather argue for removal of any spec-mandated  
clean-up of cross-origin source code?

> I would argue that we should primarily be trying to prevent [1] and  
> leave it
> up to web pages to prevent [2].

Chrome currently does neither for the getData() case - as far as I can  
tell.

> [2] is no different than using data from any
> other untrusted source, like dragging HTML or data from an XHR. It  
> doesn't
> make sense to special-case HTML pastes.

"Using data" is not the only threat model - limiting the damage potential  
when the page you paste into is malicious is harder. However, there is  
some overlap in the strategies we might use - for example event attributes  
are certainly hidden data, might contain secrets and might cause XSS  
attacks so you might argue for their removal based on both abuse scenarios  
though I think [2] is a more relevant threat.

> In order to achieve [1], the algorithm merely needs to be:
> - Remove HTML comments, script, input type=hidden, and all other elements
> that have no effect on layout (display: none). Possibly remove applet as
> well.
> - Remove event handlers, data- and form action attributes.
> - Blanking input type=password elements.

So you still suggest removing event handlers even though this is primarily  
about your case [2]?

> To me, it doesn't make sense to remove the other elements:
> - OBJECT: Could be used for SVG as I understand.

OBJECT is considered a form element, so it might have hidden data  
associated with it. It can also contain plugin content that could inject  
scripts and be used for XSS attacks. It may be too far-fetched or  
draconian to remove it though. (SVG is rich enough to be its own can of  
worms by the way..)

> - FORM: Essentially harmless once the action attribute is cleared.

Agree. I've changed the spec to allow FORM but remove @action.

> - INPUT (non-hidden, non-password): Content is already available via
> text/plain.

An input's @name attribute is basically hidden data the user will not be  
aware of pasting. I'm not sure how much of a threat this is, but we should  
give it some thought.

> - TEXTAREA: See above.

Ditto :)

> - BUTTON, INPUT buttons: Most of the content is already available via
> text/plain. We can scrub the value attribute if there is concern about  
> that.

More about @name regarding the principle of hidden data. However, I can  
easily be convinced that violating user expectations as little as possible  
is more important than taking this principle to its extreme consequences  
;-) Perhaps other people would like to chime in here?

> - SELECT/OPTION/OPTGROUP: See above.
>
> The draft also does not mention how EMBED elements should be handled.

Any thoughts on this?

>> Finally:
>> If a script calls getData('text/html'), the implementation supports  
>> pasting
>> HTML, and the data available on the clipboard is from a different  
>> origin,
>> the implementation must sanitize the content by following these steps:
> Should this sanitization be done during a copy as well to prevent data a
> paste in a non-conforming browser from pasting unexpected things?

No, I don't think so. If the content will be pasted into an application  
that doesn't support scripting and/or isn't from an untrusted origin, for  
example a typical desktop word processing app, the threats we are trying  
to handle don't really apply.

-- 
Hallvord R. M. Steen, Core Tester, Opera Software
http://www.opera.com http://my.opera.com/hallvors/
Received on Tuesday, 17 May 2011 03:42:09 UTC