[whatwg] data: URI origin from Adam Barth on 2011-03-14 (public-whatwg-archive@w3.org from March 2011)

From: Adam Barth <w3c@adambarth.com>
Date: Mon, 14 Mar 2011 14:20:42 -0700
Message-ID: <AANLkTikXyPAvkvUzXVrBvy1h1A8Rm5EhsewESp--7rmC@mail.gmail.com>

On Mon, Mar 14, 2011 at 1:27 PM, Luis Marsnao <l0mars01 at yahoo.com> wrote:
> Can data: URIs be used insecurely?

Yes, but everything can be used insecurely, even a butter knife.

> I'm attempting to write a client-side script that processes a user selected file through an input element. Since the input element interface conceals the file: URI, the best solution I can think of is to access the file through the input element's interface, get its data: URI through readAsDataURL in FileAPI's FileReader interface, and process the data: URI. However, I get not-same-origin errors when I try to use this URI. Specifically, this happens when I try to use XMLHttpRequest to retrieve an XML resource with the data URI.
>
> Is this correct?
> http://www.w3.org/TR/html5/origin-0.html#origin-0 appears to suggest it: "If url does not use a server-based naming authority, or if parsing url failed, or if url is not an absolute URL, then return a new globally unique identifier.", data URIs do not use server-based authorities, and opaque identifiers only have same origin with themselves.

Are you using WebKit?  There are long-standing bugs in WebKit where
WebKit is more conservative about the security context for data URLs
than what's in the spec.  I'd like to fix them, but I've got a bunch
of other things to do first.

> Is there a better way to process files in a client-side script? I considered using blob: URIs, but the support is not yet there.

Blob is a much better way to interact with files.  With Blob, you can
interact with much larger files and you don't need to access the disk
synchronously (which can be arbitrarily slow).

> Can data: URIs be abused with the other same-origin policies in effect? I'm trying to imagine a situation where the data: URI origin policy is necessary for security. But I'm under the impression data: URIs literally are the resources they denote, and current policies allow input only from same-origin resources or the user, so scripts get input only from those sources. If that input literally is a resource, then that resource /should/ be treated as same-origin or from the user. Am I wrong?

The security context of data URLs is a subtle issue.  Life is more
complex than you state above.

Adam

Received on Monday, 14 March 2011 14:20:42 UTC