RE: http+aes from Ian Hickson on 2012-03-07 (uri@w3.org from March 2012)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 7 Mar 2012 00:23:45 +0000 (UTC)
To: "Manger, James H" <James.H.Manger@team.telstra.com>, Carsten Bormann <cabo@tzi.org>, Adrien de Croy <adrien@qbik.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Willy Tarreau <w@1wt.eu>
cc: URI <uri@w3.org>, HTTP Working Group <ietf-http-wg@w3.org>, Anne van Kesteren <annevk@opera.com>
Message-ID: <Pine.LNX.4.64.1203062355160.6189@ps20323.dreamhostps.com>
On Tue, 6 Mar 2012, Manger, James H wrote:
>
> RE: http+aes <http://dev.w3.org/html5/spec/iana.html#http-aes-scheme>
> 
> A URL scheme that combines an address plus key material for securing the 
> content at that address has some merit. http+aes, however, has some 
> specific flaws:
> 
> 1. Encryption without integrity (as http+aes delivers) is almost 
> worthless. It rarely delivers the security that you expect.

The security that we expect here is exclusively that an untrusted CDN 
can't copy the data. As far as I can tell, this is indeed the security 
provided by this proposal.

The use case is specifically one that assumes that the attacker, in this 
case the untrusted CDN, is not in a position to damage the content.

This seems like a very reasonable assumption. If you provide a CDN 
service, and you corrupt data that you serve, you're likely to lose your 
business. It's not like you can blame anyone else for it. However, if 
content that you host happens to also turn up elsewhere on the net, then 
there's really nothing to trace the problem back to you.

(My assumption is that really what we're not trusting here isn't so much 
the CDN as a whole, as much as rogue employees within the CDN who might 
want to go and harvest files being cached there.)


> The untrusted CDN can make all sorts of modifications: truncating the 
> content; toggling any bits of the content; etc. Many modifications will 
> cause errors that depend on the content. Watch which errors occur from 
> which modifications and you learn the content. These sorts of practical 
> attacks have occurred numerous times (often with CBC mode, but 
> decrypting without checking integrity is the root cause).

Certainly damaging the content can occur, but it isn't what we're trying 
to protect against here.

There are other cases (e.g. distributed hosting for FTP sites) where the 
integrity concern is real and the privacy concern is not. For example, one 
can imagine a situation in which a Linux distribution is available on 
dozens of mirror sites, and one site is hostile and embeds malware into 
their copy of the distribution. This is an entirely separate and 
orthogonal issue, and not one that this proposal in any way attempts to 
address. If it should be addressed, then it should be addressed 
separately. (Proposals to address this do come up occasionally; so far 
none have caught the imagination of Web browser vendors.)


> 2. Hardwiring 1 specific algorithm (AES), 1 mode (counter), and 3 key 
> lengths (128-bit, 192-bit, & 256-bit) is very poor practise. We need 
> algorithm agility to cope with advances in crypto.

If we need a different algorithm, we can just create a new set of 
scheme(s). That seems trivial.


> 3. What happens if the CDN returns an HTTP redirect?

This is defined in the relevant specifications: you follow the redirect 
and its semantics.


> Is the content after following the redirect supposed to be encrypted or 
> not?

It is not, just like if an HTTP site redirects to an FTP URL, you don't 
use the HTTP protocol with the FTP site.


> Does it matter if the redirect goes to a different origin?

That seems like an orthogonal issue.


> HTTP errors (with a response body) will, presumably, not be encrypted.

As specced, they would be decrypted (into garbage).


> I hope a 500 error with a response body containing javascript cannot get 
> the http+aes URL from, say, window.location.

A 500 error containing JS would be garbled and so couldn't access the URL.


> 4. I am not sure what an http+aes URL means to the user-agent. Obviously 
> if the user-agent gets an encrypted body it can decrypt it with the 
> given key. Should it reject a response with a cleartext body? Probably 
> not as http+aes was not offering any integrity or authentication anyway.

There is no way to determine if the body is cleartext or not. By 
definition, it is encrypted.


> Does an http+aes URL have the same "origin" as an http URL (for 
> same-origin purposes, the Origin HTTP header etc)?

Its origin is determined as for any other URL, resulting in a (scheme, 
host, port) tuple.


On Tue, 6 Mar 2012, Carsten Bormann wrote:
> 
> The more interesting observation for me is that there is an HTTP URI 
> stuck in there trying to get out.
> 
> I.e., instead of
> 
> http+aes://uEdF00VkBLCfriveitl6cv4H@cdn.com/tehmovie.mov
> 
> one would really like to see
> 
> frob:hixie-3:uEdF00VkBLCfriveitl6cv4H:http://cdn.com/tehmovie.mov

That's not a bad idea, but it seems unfortunate to introduce yet more 
variation in URL scheme syntaxes, which is why I didn't go in that 
direction with the proposal and instead used Kornel's http+aes:// idea.


On Tue, 6 Mar 2012, Adrien de Croy wrote:
> 
> Have the URI point to a small file which contains the information 
> required - target URI of encrypted content, and key, encoding, and 
> checksum for integrity.

This would require significantly more complexity to implement, not to 
mention the extra round-trip to use (unless we used data: URLs, at which 
point we're really just back to a complicated-looking URL scheme).

The advantage of http+aes:// is that it fits anywhere you can put a URL 
scheme, with no change to the underlying infrastructure.


On Tue, 6 Mar 2012, Poul-Henning Kamp wrote:
> 
> You expect baby-parents to know about AES keys in hex ?

It's the hosting site that would be doing the encryption, not the parents. 
The parents already trust the hosting site, but the hosting site might not 
trust all the CDNs they might want to use.


On Tue, 6 Mar 2012, Willy Tarreau wrote:
> On Tue, Mar 06, 2012 at 09:25:42AM +0100, Anne van Kesteren wrote:
> > On Tue, 06 Mar 2012 07:55:07 +0100, Willy Tarreau <w@1wt.eu> wrote:
> > >So you mean that it's the *real* decryption key which is passed in 
> > >userinfo? It appeared obvious to me that it was just an identifier 
> > >for a key that the client had fetched somewhere else (eg: on the same 
> > >site via https or at least without passing via the CDN). If the real 
> > >key is passed in the response, then I fail to get the use case since 
> > >your CDN gets the key as well :-/
> > 
> > How? A resource on server S links to a resource on CDN C using 
> > http+aes.  C's resource is encrypted. C does not know the key. The key 
> > is hosted on S's resource as part of the http+aes link. When the user 
> > agent fetches C's resource it does not include the key, but decrypts 
> > it as data comes in. So C never knows anything about the bits it is 
> > hosting, S and the user agent do.
> 
> This is what I understood first but Ian's explanation seems to imply 
> that the key itself is passed along with the response because it would 
> otherwise be too complex to manage keys. Maybe there is something that I 
> did not understand then.

Anne's description is correct. I apologise for not conveying it as 
precisely as he did.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 7 March 2012 00:24:10 UTC