enc: URL scheme

I'd like to propose an enhancement of the http+aes scheme (in WHATWG  
draft), and a generalization of the ClearKey Encrypted Media proposal.

This URL scheme is for providing encryption and integrity guarantees on  
top of other protocols.
It is intended to enable secure hosting of content on untrusted servers,  
e.g. storing private photos, commercial videos or shared JavaScript  
libraries on 3rd party HTTP content delivery networks.


The scheme uses attribute-value pairs to define type of encryption and  
digests used and includes absolute URI of the resource to be fetched.


User agents are expected to fetch the resource using the (sub)protocol  
specified (following all redirects) and then decrypt it and/or verify its  
integrity according to the specified attribute(s).

For a start "sha1", "sha256" and "aes-ctr-key" are defined.


> (intended to be similar to data: URI)

     encurl        := "enc:" parameter *( ";" parameter ) "," absoluteURI
     parameter     := attribute "=" value
     value         := 1*pchar
     attribute     := optional_attr | required_attr
     optional_attr := 1*unreserved "?"
     required_attr := 1*unreserved

"`absoluteURI`", "`pchar`" and "`unreserved`" are the corresponding tokens  
 from [RFC2396].

Order of attributes is significant.


UA MUST NOT fetch resource from `enc:` URL which has a required attribute  
that the UA does not support.

UA MAY fetch the resource if it does not support an optional attribute.

UA MUST NOT use the resource if checksum specified in an optional  
supported attribute does not match fetched data (e.g. if UA is able to  
determine that the resource doesn't match `sha9999` digest then it should  
reject the resource).

###The `sha1` and `sha256` attributes

These attributes contain SHA-1 or SHA-256 digests of the content or the  
ciphertext. If the attribute is specified before an encryption attribute,  
it's a digest of unencrypted content. If it's specified after an  
encryption attribute, it's a digest of the ciphertext.

> (If there's a good reason to only hash before/after encryption, then it  
> should be defined as that instead, and then order of attributes could be  
> meaningless)

UA must fetch the entire resource and verify it matches the given hash. If  
it doesn't match, then the UA MUST NOT use the resource and must act as if  
the resource could not be obtained due to a network error.

     value         := sha_base64 | sha_hex

The `sha_base64` is a string base64-encoded as described in Section 6.8 of  
[RFC2045]. `sha_hex` is a string of 40 (`sha1`) or 64 (`sha256`)  
hexadecimal characters (case-insensitive).


Other specifications may define digest methods that are suitable for  
partial requests or streaming of content (e.g. hash trees).


####Security and privacy considerations

This attribute enables user agents to avoid fetching resources if they  
already have a cached resource matching the digest. An attacker could use  
such implementation to test whether users have certain known files cached  
(e.g. an image shown to users logged in to a particular website).

Attackers could also use this to obtain contents of cached files they only  
know checksum of (e.g. attacker may have seen a digital signature of a  
secret document and attempt to retrieve the document by including its  
digest in a bogus `enc:` URL).

User agents may want to limit sharing of cached files to files with  
`Cache-Control: public` or avoid sharing cached files across origins.

###The `aes-ctr-key` attribute

     value        := aes128_base64 | aes192_base64 | aes256_base64

This key is provided in the form of 16, 24, or 32 bytes base64-encoded as  
described in Section 6.8 of [RFC2045].

The message body must be decrypted by applying the AES-CTR algorithm using  
the key specified, and using a zero nonce.

If the base64-decoded value does not consist of exactly 16, 24, or 32  
bytes, then the user agent must act as if the resource could not be  
obtained due to a network error, and may report the problem to the user.


####Security considerations

URLs using this attribute contain sensitive information (the key used to  
decrypt the referenced content) and as such should be handled with care,  
e.g. only sent over TLS-encrypted connections, and only sent to users who  
are authorized to access the encrypted content.

User agents are encouraged to not show the full `enc:` URLs in user  
interface elements where the URL is displayed, as it could be used to  
obscure the domain name.

This attribute enables the content of a particular resource to be  
encrypted. If protocol used to fetch the resource is not encrypted itself,  
it may leak private information through metadata, e.g. information held in  
HTTP headers. The length and name of the resource may still be visible.
The rate at which the data is transmitted is also unobscured. If this  
scheme is used to obscure private information, it is important to consider  
how these side channels might leak information.

Each resource encrypted in this fashion must use a fresh key. Otherwise,  
an attacker can use commonalities in the resources' plaintexts to  
determine the key and decrypt all the resources sharing a key.

The encryption does not guarantee integrity. Attacker will be able to  
truncate or corrupt the resource unless a cryptographically strong  
checksum is used as well.

> (Encryption without integrity is sufficient if you just want to protect  
> against the CDN being hacked/accidentally leaking all their files,  
> rather than malicious CDNs.)


For the purpose of Same-Origin Policy the URL embedded in the `enc:`  
scheme should be used.

`enc:x=y,http://example.com` and `http://example.com` are same origin.

`enc:x=y,http://example.com` and `enc:x=y,https://example.com` are not  
same origin.

regards, Kornel Lesiński

Received on Wednesday, 13 June 2012 23:45:44 UTC