Re: ACTION-308 (part 2) Updates to 'The Self-Describing Web' from John Kemp on 2010-01-07 (www-tag@w3.org from January 2010)

From: John Kemp <john@jkemp.net>
Date: Thu, 7 Jan 2010 09:43:17 -0500
To: Larry Masinter <masinter@adobe.com>
Cc: "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <EFB76B5A-1364-4BDC-AECC-DAFB4A43D5F0@jkemp.net>
On Jan 6, 2010, at 4:33 PM, Larry Masinter wrote:

> I am strongly opposed to promoting content-type sniffing to be
> an architectural principle.

I agree.

> 
> I find it only marginally acceptable to ALLOW content-type sniffing
> by conforming receiving agents, when there is clear, compelling and
> overwhelming evidence that there is a significant amount of
> of content that *needs* sniffing, and in that case, the "sniffing"
> specification should not *mandate* sniffing but merely allow it,
> and discourage its use.

The edits that I have proposed are simply aimed at saying "if you are going to sniff, then do it this way" and are deliberately NOT intended to revise the basic story in SDW and Authoritative Metadata, which documents already clearly state the benefits of following that model, and the problems which arise if one does not follow the model. AuthMeta advises clearly against the use of unreliable heuristics.  

I do think, however, there is some benefit in saying essentially: "if you're going to sniff, then follow a broadly accepted and secure algorithm". From our discussion of the proposed IETF sniffing draft (http://tools.ietf.org/html/draft-abarth-mime-sniff-03) in September (http://www.w3.org/2001/tag/2009/09/24-minutes#item03) I believe we were broadly in agreement that this document was an improvement over the idea that implementations might sniff, using different algorithms (making privilege escalations more likely and working against interoperability). 

> 
> However, no future design, context, application, W3C recommendation
> or other specification should be encouraged to "sniff" content
> and interpret message content based on unreliable heuristics
> overriding unambiguous content labels.

Again agreed. As far as I can tell, neither SDW nor AuthMeta encourages these actions. My proposed edits deliberately don't change that story. AuthMeta already describes some of the consequences of not following the authoritative metadata supplied in the content container, and I explicitly reference that section in my proposed edits to SDW (see below). 

- johnk

> 
> Larry
> --
> http://larry.masinter.net
> 
> 
> -----Original Message-----
> From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] On Behalf Of John Kemp
> Sent: Monday, January 04, 2010 12:47 PM
> To: www-tag@w3.org WG
> Subject: ACTION-308 (part 2) Updates to 'The Self-Describing Web'
> 
> Hello,
> 
> As the second part of ACTION-308, I propose the following updates to 'The Self-Describing Web' finding [SelfDescWeb], to acknowledge the reality of content-type sniffing. I shall now mark ACTION-308 to be 'pending review'.
> 
> Regards,
> 
> - johnk
> 
> [SelfDescWeb] - http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html
> [ACTION-308] - http://www.w3.org/2001/tag/group/track/actions/308
> [F2FMinutesSep2009] - http://www.w3.org/2001/tag/2009/09/24-minutes#item03
> 
> (begin proposed changes)
> 
> 1.
> 
> Section 1: Introduction
> 
> After bullet point:
> 
> Each representation should include standard machine-readable indications, such as HTTP Content-type headers, XML encoding declarations, etc., of the standards and conventions used to encode it. 
> 
> Add:
> 
> ... and every effort should be made to ensure that the intentions of the content author and publisher regarding interpretation of the content are accurately conveyed in such indications.
> 
> 2.
> 
> Section 2: The Web's Standard Retrieval Algorithm
> 
> After paragraph:
> 
> Consider instead a different example, in which Bob clicks on a link to ftp://example.com/todaysnews. Although Bob's browser can easily open an FTP connection to retrieve a file, there is no way for the browser to reliably determine the nature of the information received. Even if the URI were ftp://example.com/todaysnews.html the browser would be guessing if it assumed that the file's contents were HTML, since no normative specification ensures that data from ftp URIs ending in .html is in any particular format. 
> 
> Add:
> 
> As noted above, and for other reasons (such as content aggregation), it may not be possible for a browser to reliably determine, via inspection of a Content-Type HTTP header or other external metadata alone, the intended interpretation of Web content. In such cases, a browser may inspect the content directly (commonly known as "sniffing"). The consequences of such an action are described in [AuthoritativeMetadata]. In particular, sniffing Web content should only be done using an accepted and secure algorithm, such as [BarthSniff].
> 
> 3.
> 
> References:
> 
> Add:
> 
> [BarthSniff] http://tools.ietf.org/html/draft-abarth-mime-sniff-03
>
Received on Thursday, 7 January 2010 14:55:12 UTC