[Integrity] Some comments on Cross-Origin leakage and content types from Arjan Veenstra on 2014-09-20 (public-webappsec@w3.org from September 2014)

From: Arjan Veenstra <arjan@veenstra.cx>
Date: Sat, 20 Sep 2014 10:19:11 +0200
To: public-webappsec@w3.org
Message-ID: <1d586749bdf94d7db911b94edaf5c781@d6.nl>

Hi,

I've been looking at this proposal mostly interested of the improved 
caching of common resources it might provide, so my mindset might be 
tainted somewhat. But looking at the section 6.3 it occurred to me that 
most risk mentioned there could be mitigated if the document author 
could specify the intended usage of the resource. A simple 'private' 
marker which tells the UA it's not allowed to add the resource to it's 
hash-based cache could protect sensitive resources from these type of 
attacks. Or perhaps the spec should err on the save side and only allow 
caching when resources are marked public.

Behind this is the assumption there are two types of resources you want 
integrity checks on. The first being common public resources such as 
javascript libraries, the second being resources specific to your 
application which are hosted elsewhere. Wider caching is mostly useful 
for resources in the first category, but the presence of those resource 
is unlikely to leak any usable information. As a common resource it 
could have entered the cache from lots of places.
Resources in the second category generally won't benefit from caching 
beyond the currently available caching mechanism, marking those private 
won't hurt performance but does effectively remove any new attack 
surface introduced by hash based caching.

Of course there are edge cases, for instance a library which is 'public' 
but not commonly used. Finding a cache hit might still give a lower 
certainty indication a user visited a specific site. But an additional 
flag would allow document authors to act according to their own 
assessment of the risks.

I'm in doubt if a request for a resource marked private should be 
allowed to be fulfilled from cache. I'm guessing that if the hash is 
secure that shouldn't be an issue.

I'm also missing a description of how to handle scenarios where a 
resource might be available in multiple content types. For instance, a 
server might prefer to serve an image as svg but fall back to serving a 
png file when the accept header doesn't include svg. I could see 
something similar happening in the future with alternative scripting 
(e.g. Dart, Coffeescript, Typescript) languages where a server might 
serve either the original script or the compiled-to-javascript version 
based on the accept header. In more abstract terms I'd say that since an 
URL points to a resource which might be represented in different ways 
you'll always have to account for the possibility a resource has 
different representations.
I guess the obvious solution would be to allow different hashes with 
different content types to be specified. The spec doesn't seem to forbid 
this, but it doesn't explicitly allow it either and tends to speak about 
the content type in singular form.

The same applies to localized resources, as the content of an 
Accept-Language header might cause different content to be served as 
well. Perhaps a language attribute needs to be added as well.

Regards,
Arjan Veenstra

Received on Sunday, 21 September 2014 09:09:21 UTC