W3C home > Mailing lists > Public > ietf-http-wg@w3.org > April to June 2009

Re: Content Sniffing impact on HTTPbis - #155

From: Mark Nottingham <mnot@mnot.net>
Date: Sat, 6 Jun 2009 13:05:29 +1000
Cc: "William A. Rowe, Jr." <wrowe@rowe-clan.net>, Julian Reschke <julian.reschke@gmx.de>, Mark Baker <distobj@acm.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, Roy Fielding <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <2F22D15C-B04E-4035-805F-D5183AD49734@mnot.net>
To: Adam Barth <w3c@adambarth.com>
Sounds like we're moving towards consensus (because no one is  
particularly happy).

That would make the proposal to replace p3 3.2.1:

> When an entity-body is included with a message, the data type of that
> body is determined via the header fields Content-Type and Content-
> Encoding.  These define a two-layer, ordered encoding model:
>
>   entity-body := Content-Encoding( Content-Type( data ) )
>
> Content-Type specifies the media type of the underlying data.
> Content-Encoding may be used to indicate any additional content
> codings applied to the data, usually for the purpose of data
> compression, that are a property of the requested resource.  There is
> no default encoding.
>
> Any HTTP/1.1 message containing an entity-body SHOULD include a
> Content-Type header field defining the media type of that body.  If
> and only if the media type is not given by a Content-Type field, the
> recipient MAY attempt to guess the media type via inspection of its
> content and/or the name extension(s) of the URI used to identify the
> resource.  If the media type remains unknown, the recipient SHOULD
> treat it as type "application/octet-stream".

with:

"""
When an entity-body is included with a message, the data type of that
body is determined via the header fields Content-Type and Content-
Encoding.  These define a two-layer, ordered encoding model:

   entity-body := Content-Encoding( Content-Type( data ) )

Content-Type specifies the media type of the underlying data. Any
HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If
the Content-Type header field is not present, it indicates that the
sender does not know the media type of the data; recipients MAY
either assume that it is "application/octet-stream" or examine the
content to determine its type.

Content-Encoding may be used to indicate any additional content
codings applied to the data, usually for the purpose of data
compression, that are a property of the requested resource.  There is
no default encoding.

Note that neither the interpretation of the data type of a message nor
the behaviours caused by it are not defined by HTTP; this
potentially includes examination of the content to override any
indicated type ("sniffing").
"""

Question -- the wording above explicitly allows sniffing of both  
content-type and content-encoding; do we want to allow C-E?

I'm not sure what "that are a property of the requested resource"  
means in the context of content-encoding; it seems to be a variation  
of the "requested variant" problem. Perhaps we can deal with that  
then...


On 06/06/2009, at 5:40 AM, Adam Barth wrote:

> On Fri, Jun 5, 2009 at 12:28 PM, William A. Rowe, Jr.
> <wrowe@rowe-clan.net> wrote:
>> Adam Barth wrote:
>>> On Fri, Jun 5, 2009 at 7:10 AM, William A. Rowe, Jr.
>>> <wrowe@rowe-clan.net> wrote:
>>>> Server misrepresentation of Content-Type will cease, once  
>>>> browsers stop
>>>> misrepresenting the content type.  Until web authors and  
>>>> administrators
>>>> (including mass vhosters) become aware that they have  
>>>> misrepresented the
>>>> data they are serving, they will continue to generate the 3%  
>>>> (IIRC) of
>>>> mislabeled content.
>>>
>>> This is unlikely to ever occur given the market dynamics of  
>>> browsers.
>>
>> Which is what makes the entire discussion so entirely laughable.
>>
>> I concur with Roy and Julian, leave Content-Type null with an  
>> undefined
>> content type.  while user agents persist in nonsense (such as  
>> decoding
>> UTF-7 when presented an explicit charset), this is out of our  
>> server-side
>> and authors' hands, and no spec is going to correct their  
>> misbehavior or
>> unsafe practices.
>>
>> For some tiny minority who care, the very absence of Content-Type  
>> conveys
>> new, useful metadata that should not break 2616 definitions.
>
> I don't share your pessimistic view, but that resolution is workable
> from my perspective.
>
> Thanks,
> Adam


--
Mark Nottingham     http://www.mnot.net/
Received on Saturday, 6 June 2009 03:06:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:51:03 GMT