Re: NEW ISSUE: content sniffing from Adrien de Croy on 2009-04-03 (ietf-http-wg@w3.org from April to June 2009)

From: Adrien de Croy <adrien@qbik.com>
Date: Fri, 03 Apr 2009 15:13:04 +1300
To: Ian Hickson <ian@hixie.ch>
CC: Shane McCarron <shane@aptest.com>, Adam Barth <w3c@adambarth.com>, "Roy T. Fielding" <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <49D570B0.6000708@qbik.com>
I agree with Roy and Shane

the issue of next-level content is not an issue for the HTTP (transport) 
layer, it's an issue for the next layer up.

Take a look at the network stack.

Start with an ethernet network packet , there is a frame type header in 
there which tells the stack which next-level handler to pass the packet 
to. 

assuming the frame type is IP, then the packet is passed to the IP 
protocol hander.

In the IP header there is a protocol header as well, that specifies the 
next-level protocol handler.  The IP protocol handler looks at this. 
Supposing this is TCP, the packet is then passed up to the TCP protocol 
handler.

At no stage does the ethernet handler VALIDATE that the next level 
header is actually an IP header just because the frame type is IP.  This 
is left to the IP handler
At no stage does the IP handler VALIDATE that the next level header is 
actually a TCP header just because the protocol ID is 6.

So why should HTTP validate that the next level identifier 
(Content-Type) is correct for the entity body?

There are zillions of content types.  Why should HTTP care about them?  
It's an impossible task.

Now in a browser there is a next-level processor above HTTP. It's the 
thing that actually gets the resource.  It's free to do whatever it 
likes with that content.  However transport of the content is irrelevant.

HTTP is only a very small part of the function of a browser.  It's 
purely used to transfer resources around.  You could build a browser on 
top of pretty much any transfer protocol.  Blurring the lines between 
the transport, and what is transported is a really bad idea.  Apply that 
idea to other scenarios and you see how bad it really is.  Imagine if 
SMTP had to be concerned about validating email attachments.  Imagine if 
FTP had to be concerned about validating the content of files.  Imagine 
if the postal system had to be concerned about validating the content of 
letters.  You could argue that in these 3 cases they actually tried to a 
certain extent.  But you'd have a hard time claiming any success at it.

By all means develop framework and guidelines for how to process (e.g. 
sniff) transported data, but it's a layer above HTTP issue.  Referring 
to it in HTTP would be like referring to TCP in the IP spec, and once 
you start that, where do you stop? ICMP? IGMP? UDP?  SIP? GRE? L2TP?  
Every time you created a new higher-level protocol you'd need to update 
the specs for each lower one it could be layered on.

For implementors, they need to be able to easily find what they need to 
do.  It's like there needs to be some higher-level resource which 
contains a reference to all the things you should consider when creating 
an HTTP agent.  If there were such a resource, it would contain a 
reference to the transport spec (HTTP) as well as other considerations, 
and could be expanded to refer to issues such as sniffing or any other 
issue that confronted a browser developer.

just my 3.5c



Ian Hickson wrote:
> On Thu, 2 Apr 2009, Shane McCarron wrote:
>   
>> Roy was not "ignoring the reality of existing Web content".  He is 
>> saying the same thing every other expert from this community has said - 
>> that the error handling mechanism you are proposing to codify at the 
>> *protocol* layer is not a protocol issue.
>>     
>
> With all due respect, not "every other expert from this community" agrees 
> with Roy. I don't, nor do many others.
>
> Personally I think that whatever spec defines Content-Type should define 
> how Content-Type works in reality. I would have no problem with HTTP 
> itself not defining the Content-Type header at all. What I personally 
> object to is the HTTP spec defining Content-Type in a way which is 
> incompatible with billions of deployed resources.
>
>
>   
>> If you and others like you want to carefully define how your 
>> applications deal with situations where the underlying layers have 
>> provided you with mis-information, that is completely up to you.
>>     
>
> Could we please not make this an "us vs them" discussion?
>
>
>   
>> However, it is inappropriate to foist those "solutions" onto a community 
>> that is clearly saying "this is not an issue for our layer"!
>>     
>
> By the same logic, it would be inappropriate to foist the solution 
> currently in the HTTP specification onto a community that is clearly 
> saying "this IS an issue for this layer".
>
>
>   
>> Moreover, it would be inappropriate to attempt to define your solution 
>> in such a way that encourages the continued transmission of that 
>> mis-information.
>>     
>
> The current specification text, as it has stood since the dawn of the Web, 
> has failed to discouraged the continued transmission of this 
> mis-information. If we are trying to remove specification text that 
> encourages the transmission of such mis-information, we should change the 
> spec!
>
>
>   
>> The protocol provides mechanisms for servers to declare the type of a 
>> payload. If some servers lie about it, that's their mistake.  We should 
>> not be trying to insist that every endpoint deal with errors from the 
>> broadcaster.  That way lies madness.  Look at it this way - if there 
>> were billions of devices that could magically capture music out of the 
>> air (call them radios), and hundreds or even thousands of sources for 
>> that music, and tens of those sources were sending the music out in such 
>> a way that the devices couldn't capture it... what would happen?  Would 
>> all the devices get changed?  Absolutely not!  The sources would fix 
>> themselves or they would disappear.  Either way, problem solved.
>>     
>
> If there were billions of devices called radios, and of thousands of 
> sources, tens of those sources were sending the music out in such a way 
> that the devices couldn't capture it, but one of those sources was the 
> incredibly popular BBC, then new devices called radios would immediately 
> appear that DID handle those few incorrect sources. This is elementary 
> capitalism.
>
>
>   
>> I am sure that the solution you are proposing works for the small subset 
>> of the data and limited collection of data processors that you are 
>> considering. In the greater collection of data that is the entire 
>> Internet, and the greater collection of data processors that are all 
>> agents that use HTTP, the solution just has no general place.  At least 
>> that's my opinion.  I could be wrong.
>>     
>
> I believe you are.
>
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Friday, 3 April 2009 02:10:44 UTC