Origin vs Authority; use of HTTPS (draft-nottingham-site-meta-01)

Reading draft-nottingham-site-meta-01...

> 4. Discovering host-meta Files

> The metadata for a given authority can be discovered by  
> dereferencing the path /host-meta on the same authority. For  
> example, for an HTTP URI [RFC2616], the following request would  
> obtain metadata for the authority "www.example.com:80";

Editorial nit: That semicolon wants to be a colon.

> GET /host-meta HTTP/1.1
> Host: www.example.com

It is somewhat unclear what the scope of the host-meta file is, or  
more precisely, how the URI for the host-meta file is derived from the  
URI of the resource that the metadata apply to.

Section 4 seems to suggest that the URI is maybe generated by  
dereferencing the relative URI reference /host-meta using the  
resource's URI as the base URI, but it doesn't say that clearly; the  
use of "authority" suggests that the choice of the protocol is  
actually up to the implementation.

 From the previous apps-discuss thread, it seems like the main use  
case for permitting metadata to leak across schemes (and therefore,  
typically ports -- though ports and schemes are strictly speaking  
orthogonal) lies with URI schemes that do not have a resource  
retrieval operation readily available, e.g., mailto.

On the other hand, I'm extremely wary about anything near HTTP that  
might tear down origin boundaries without a great deal of care.  E.g.,  
a purely authority-based approach might permit metadata to leak from  
the HTTP part of a site (where no integrity protection is given) into  
its HTTPS part (where integrity protection and authenticity of data is  
deemed important), possibly permitting attacks against web  
applications that are ostensibly protected -- as is alluded to in the  
security considerations.

The obvious solution to that part of the puzzle is to let the  
mechanism default to the same URI scheme, unless there is a specific  
convention to the contrary.  That should cover any URI schemes for  
which a safe retrieval operation is defined (HTTP, HTTPS, FTP come to  
mind).

For other URI schemes, one could either punt on this issue completely,  
define a default fall-back to HTTP (or HTTPS, depending on which of  
the two better matches the security properties of the protocol in  
question), or actually say explicitly what's the correct scheme.

Thoughts?

--
Thomas Roessler, W3C  <tlr@w3.org>

Received on Tuesday, 10 February 2009 14:57:15 UTC