Re: ACTION-308 (part 2) Updates to 'The Self-Describing Web' from John Kemp on 2010-01-07 (www-tag@w3.org from January 2010)

From: John Kemp <john@jkemp.net>
Date: Thu, 7 Jan 2010 10:57:32 -0500
To: Eric J. Bowman <eric@bisonsystems.net>
Cc: "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <57E17F45-9AA1-4691-85E5-205413D9EBD2@jkemp.net>
On Jan 5, 2010, at 9:33 PM, Eric J. Bowman wrote:

> Is ACTION-308 saying that the common implementation of XSLT on the Web
> is wrong, and that all the major browsers are broken?

ACTION-308 basically says only that we should revise [SDW] and [AuthMeta] to acknowledge the fact that content sniffing occurs and that if you have to sniff you should do so in a secure and accepted manner.

>  Most XSLT-based
> Web systems out there follow the longstanding model described here:
> 
> http://w3schools/xsl/
> http://www.w3schools.com/xsl/cdcatalog_with_xsl.xml
> 
> Take an .xml document, embed an XML PI for the XSLT transformation,
> serve as text/xml, and voila!  Works in all major browsers, none of
> which consult the user about the privilege escalation consequences of
> treating the XSLT output as text/html, right down to allowing
> Javascript to execute, despite the authoritative type being text/xml.

I believe that the IETF content sniffing draft [BarthSniff] says that both sniffed text/html and sniffed text/xml are scriptable and thus there should be no privilege escalation between text/xml and text/html (they would be considered the same from a security perspective). I'd also note that the authoritative type and sniffed type seem to agree in the case you mention - the transformation is executed by the browser after it has received the content if I understand your case correctly?

> 
> If this is so, does serving the .xml document as application/xml suffer
> the same fate?  Consider:
> 
> http://www.w3.org/TR/MathML2/overview.xml
> 
> Not that they're using Javascript, but there's no reason it wouldn't
> work if they did.  Is the TAG saying that HTML content must be served
> with an HTML media type?

If HTML content is served over the Web (between an HTTP client and server, say) then are advantages described in [AuthMeta] and [SDW] to using authoritative metadata to correctly label the content sent "over the wire" between the server and the client. Once the client has that content however, it may certainly perform additional work on it, such as transforming it from one content type to another. The TAG findings do not, I believe, govern such transformations, executed only after content has been acquired and recognized.

Regards,

- johnk

[AuthMeta] http://www.w3.org/2001/tag/doc/mime-respect.html
[SDW] http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html
[BarthSniff] http://tools.ietf.org/html/draft-abarth-mime-sniff-03

> 
> -Eric
> 
> John Kemp wrote:
>> 
>> Hello,
>> 
>> As the second part of ACTION-308, I propose the following updates to
>> 'The Self-Describing Web' finding [SelfDescWeb], to acknowledge the
>> reality of content-type sniffing. I shall now mark ACTION-308 to be
>> 'pending review'.
>> 
>> Regards,
>> 
>> - johnk
>> 
>> [SelfDescWeb] -
>> http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html
>> [ACTION-308] - http://www.w3.org/2001/tag/group/track/actions/308
>> [F2FMinutesSep2009] -
>> http://www.w3.org/2001/tag/2009/09/24-minutes#item03
>> 
>> (begin proposed changes)
>> 
>> 1.
>> 
>> Section 1: Introduction
>> 
>> After bullet point:
>> 
>> Each representation should include standard machine-readable
>> indications, such as HTTP Content-type headers, XML encoding
>> declarations, etc., of the standards and conventions used to encode
>> it. 
>> 
>> Add:
>> 
>> ... and every effort should be made to ensure that the intentions of
>> the content author and publisher regarding interpretation of the
>> content are accurately conveyed in such indications.
>> 
>> 2.
>> 
>> Section 2: The Web's Standard Retrieval Algorithm
>> 
>> After paragraph:
>> 
>> Consider instead a different example, in which Bob clicks on a link
>> to ftp://example.com/todaysnews. Although Bob's browser can easily
>> open an FTP connection to retrieve a file, there is no way for the
>> browser to reliably determine the nature of the information received.
>> Even if the URI were ftp://example.com/todaysnews.html the browser
>> would be guessing if it assumed that the file's contents were HTML,
>> since no normative specification ensures that data from ftp URIs
>> ending in .html is in any particular format. 
>> 
>> Add:
>> 
>> As noted above, and for other reasons (such as content aggregation),
>> it may not be possible for a browser to reliably determine, via
>> inspection of a Content-Type HTTP header or other external metadata
>> alone, the intended interpretation of Web content. In such cases, a
>> browser may inspect the content directly (commonly known as
>> "sniffing"). The consequences of such an action are described in
>> [AuthoritativeMetadata]. In particular, sniffing Web content should
>> only be done using an accepted and secure algorithm, such as
>> [BarthSniff].
>> 
>> 3.
>> 
>> References:
>> 
>> Add:
>> 
>> [BarthSniff] http://tools.ietf.org/html/draft-abarth-mime-sniff-03
>
Received on Thursday, 7 January 2010 15:58:06 UTC