[SRI] integrity + login looks broken from Manger, James on 2015-05-07 (public-webappsec@w3.org from May 2015)

From: Manger, James <James.H.Manger@team.telstra.com>
Date: Thu, 7 May 2015 13:07:22 +1000
To: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <255B9BB34FB7D647A506DC292726F6E1285B99FBD6@WSMSG3153V.srv.dir.telstra.com>

Comments on http://w3c.github.io/webappsec/specs/subresourceintegrity/ editor's draft 05 May 2015:

Great concept.

1. Section 3.3.2 "Is resource eligible for integrity validation" looks quite broken with respect to login-related headers.

"Authorization" is a request header so it doesn't make sense so to check "if resource contains ... Authorization" in step 2. Perhaps the intention was to check if the *request* contains Authorization. However, we shouldn't do that either as it catches some logins but misses most (eg misses login via a web form tied to a session cookie).

Perhaps the intention was to cope with the situation where a request triggers a login flow, before the request is later repeated to get the actual resource content. A better way to do this would be to say a resource is not eligible for integrity validation if it is returned with a 401 (Unauthorized) or 407 (Proxy authentication required) status.

In fact, perhaps any response without a 2xx status should be ineligible; or restrict it to only 200 (Ok) responses.

2. Section 3.4 "Modifications to Fetch" step 4 (handling 401) seems disastrous. It breaks a browser's normal handling of 401s. Is the idea to prevent sub-resources triggering the browser's (HTTP BASIC) password prompt that the user is likely to mistake as coming from the top page and, hence, inadvertently reveal their password for the top site to the sub-resource? That is a nice idea, but really needs a dedicated signal, instead of piggy-backing on the top page stating the hash of the sub-resource. Login prompts from sub-resources should be safe regardless of the top page knowing the sub-resource content precisely.

3. A web server should be able to specify an integrity value when responding with a 3xx redirect. Hopefully this could be specified in a future revision , just like section 3.5 says future revisions are likely to add integrity to other HTML elements.

4. Example 1 and example 2 should include a full hash, not have part elided with "...".

5. Example 1 and example 2 should have different hash values as they are for different resources.

6. I'm a little surprised to see potentially real domains in examples 1 & 2 (site53.cdn.net and analytics-r-us.com) instead of, say "cdn.example.net" and "analytics-r-us.example.com".

7. Section 3.1 "Integrity metadata" says the format is the same as hash-source from CSP2. This is not quite right. hash-source is defined to start and end with a single quote. Instead, refer to section 3.6 "The integrity attribute" that references hash-algo (not hash-source) from CPS2.

8. The example script "alert('Hello, world.');" in the text of section 3.1 uses "smart-quotes", characters U+2018 and U+2019 (LEFT and RIGHT SINGLE QUOTATION MARK) whereas it is supposed to use ' U+0027 (APOSTROPHE). This matters because the bytes are hashed. Cut-n-pasting the example fails. The correct U+0027 is used when the script is repeated in the note.

9. The examples in section 3.2.1 "Agility" have the wrong hash values. They should match section 3.1 where the example script "alert('Hello, world.');" is mentioned. Instead, 3.2.1 hashes the 13 characters "Hello, world." despite supposedly being the content of a javascript file hello_world.js.

10. Options in hash-with-options at first glance look like they use the well known URI query parameter syntax (?name=value&name=value). For instance, you might expect the following to be a valid integrity value:
sha256-C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg=?a=1&b=2
That is invalid. It has to be:
sha256-C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg=?a=1?b=2
This valid example uses two "?" to separate the two options.
"?" is nicer than "&" as it doesn't need to be escaped in XML. However, being similar-but-different to query strings for a niche feature seems guaranteed to be implemented incorrectly.

At a minimum, some examples with multiple options need to be include to highlight the difference.

Alternatively, define option-expression to be the same as a URI query component.

11. The character restrictions on option-value look quite arbitrary. Where did the subset of 16 allowed symbols come from? ";" is not included so an option-value cannot be a parameterized media type (eg ?ct=text/plain;charset=utf-8) despite this being the most likely use of an option. ":" is not included so an option-value cannot be a URI. Non-ASCII chars are not supported so an option-value cannot be text intended for humans. No escaping mechanism is defined (eg %xx is not defined as an escape for a byte of a UTF-8 encoding), though I guess individual options could specify that their value uses %xx escaping.

--
James Manger

Received on Thursday, 7 May 2015 03:07:59 UTC