Re: Interaction of CSP and IRIs from Adam Barth on 2012-09-06 (public-webappsec@w3.org from September 2012)

From: Adam Barth <w3c@adambarth.com>
Date: Thu, 6 Sep 2012 15:57:50 -0700
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: public-webappsec@w3.org
Message-ID: <CAJE5ia9iGKwGw9f6iPNUvsxaFMZG2rQwJx1tLuSzidcnECDP1A@mail.gmail.com>
CSP 1.0 operates in terms of URIs, not IRIs, so these issues don't
really occur.  For example,

"To check whether a URI matches a source expression, ..."

More detailed comments below.

On Thu, Sep 6, 2012 at 2:39 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> Dear all,
>
> I was just reading through the CSP draft, and I'm very concerned by the
> handling of non-ASCII characters in CSP.  Specifically, I'm concerned about
> four things:
>
> A)  Lack of description for how one goes from an IRI or partial IRI to a
>     host-source expression.

That never occurs.

> B)  Lack of description for how one compares a source expression to an
>     IRI.

That never occurs.

> C)  Lack of description for how one goes from a Unicode string to
>     policy.

That never occurs.

> D)  The fact that the current setup is likely to cause interop problems.

There are general interop problems with IRIs, but we're unlikely to
resolve them in this working group.  If you're interested in resolving
interop problems with IRIs, that's a much larger problem.

> As far as I can tell, the current setup is as follows:
>
> 1)  All CSP policies are made up of bytes in the ASCII range (and in
> particular, a subset of that range).

Correct.

> Non-ASCII hostnames are expected to be
> encoded as punycode, I guess (though this is not actually stated anywhere;
> see concern A above).

We're operating in terms of URIs, so the notion of "punycode" doesn't occur.

> Non-ASCII characters in paths presumably expected to
> be %-encoded, but the specification doesn't say what encoding should be used
> for this (concern A again).

Are your comments about CSP 1.0 or 1.1?  We don't do anything with
paths in CSP 1.0.

> In practice, by the way, at least one
> implementation allows non-ASCII bytes in paths, though I think the spec is
> pretty clear that as things stand this is not allowed.

Would you be willing to contribute a test case to that effect so we
can catch these sorts of bugs?

> 2)  When comparing a source expression to an IRI, the IRI needs to first be
> converted to a URI, presumably per RFC 3987.

We never compare a source expression to an IRI.  We only compare
source expressions to URIs.

> If the presumption is correct,
> this should probably be explicitly called out (concern B above).

Defining how a user agent ought to translate an IRI to a URI is
outside the scope of this document.  That's a can of worms that we'll
have no hope of resolving.  Let's leave that mess to the HTML spec.

> 3)  When converting a Unicode string to a policy, presumably one does it by
> taking the numeric value of each codepoint and treating it as an ASCII
> character index?  If so, this should be explicitly called out (concern C
> above).

Are your comments about CSP 1.0 or 1.1?  We don't ever convert a
Unicode string to a policy in CSP 1.0.  We do that in CSP 1.1, and I
agree that we should add some further explanation of how to do that to
1.1.

> In practice, I expect people to just call their favorite escape() method on
> their strings if they have to shoehorn them into an ASCII format, which
> means that we'll get a mix of %-encoding in as ISO-8859-1 and UTF-8 at the
> very least, and very possibly others.  The result will be lack of interop
> (concern D).

I'm not sure how to respond to this statement.  Presumably they'll do
what the CSP 1.1 specification says to do.

> It seems to me that a lot of these problems were alleviated if CSP policies
> were defined as sequences of Unicode codepoints, with a comparison function
> to IRIs.

I disagree.  IRIs are an interop can or worms.  We can paper over the
problem in various ways.  The way we're currently papering over the
problem is to ignore IRIs entirely and work only with URIs.

> The spec would also need to define how to construct such a
> sequence of Unicode codepoints from a Content-Security-Policy HTTP header or
> a Content-Security-Policy-Report-Only HTTP header,

We might want to do that in CSP 1.1 when we have to deal with input
policies in Unicode.  In CSP 1.0, policies come only from HTTP, which
is not defined in terms of Unicode, which means that we don't need to
tackle this issue in 1.0.

> but the result would be
> to allow authors to use strings that actually make sense to them in CSP
> policies instead of shoehorning them into an ASCII-only format in
> likely-broken ways.

in CSP 1.0, the only way to author policies is via HTTP headers, which
are not defined in terms of Unicode, so we'll need to wait for CSP 1.1
to realize the dream of authoring policies in Unicode.

> Thank you for taking the time to read all that,

Thanks for your feedback!  This is definitely stuff we'll need to
worry about more for 1.1, but I think we've effectively dodged this
can of worms in 1.0.

Adam
Received on Thursday, 6 September 2012 22:58:50 UTC