Re: TICKET 259: 'treat as invalid' not defined from Adam Barth on 2010-11-07 (ietf-http-wg@w3.org from October to December 2010)

From: Adam Barth <ietf@adambarth.com>
Date: Sun, 7 Nov 2010 13:59:33 -0800
To: Julian Reschke <julian.reschke@gmx.de>
Cc: httpbis <ietf-http-wg@w3.org>
Message-ID: <AANLkTin5eYmDXNJT=c7xOFrhfO=gwn6x3UZb1r_r4O56@mail.gmail.com>
On Sun, Nov 7, 2010 at 12:50 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 07.11.2010 21:32, Adam Barth wrote:
>>> On 02.11.2010 03:56, Adam Barth wrote:
>>>>
>>>> ...
>>>> The browser use case proceeds from the following premises.
>>>>
>>>> 1) Many servers send invalid messages to user agents.
>>>
>>> No data was provided that this is indeed the case for C-D.
>>
>> No data was provided that this isn't the case.  Given that we see
>> invalid message everywhere else, common sense tells us that we will
>> see invalid messages here too.
>
> In the absence of data telling me something else, I'll assume that servers
> do sane things. I may be wrong. I just don't see why a server would *ever*
> send two disposition types, given it's a waste of bytes and that it doesn't
> cause the same thing to happen in different UAs.

Why would a server ever send two Content-Types headers?  Why an HTML
document ever mis-nest tags?  Why would a server ever send nonsense
characters instead of an HTTP header?  All these things happen in
practice because not everyone who operates servers is perfect.

>> ...
>>>>
>>>> 3) If a new user agent wishes to compete in the market, that user
>>>> agent needs to handle the invalid messages in the same way as the
>>>> existing user agents.
>>>
>>> As my tests show, there is no "same" behavior right now. It's totally not
>>> clear that this is a problem in practice.
>>
>> It is a problem in practice.
>
> Why?

Lack of interoperability usually a problem.  Here's an example that
has been causing me a lot of pain recently.  Consider this syntax in
an HTML document:

<script src="..." />

Should we interpret that as a self-closing script tag or as an open
script tag?  Turns out, historically, WebKit was the only major HTML
parser that treated it as a self-closing tag.  Consequently, content
that was authored primarily for WebKit assumed it was a self-closing
tag and content that was authored more generally assumed it was a open
tag.

This put WebKit in a bind.  It couldn't change its behavior to be
compatible with general web content because that would break
WebKit-specific content.  Now, when we implemented the HTML5 parsing
algorithm in WebKit, we changed WebKit's behavior to treat this syntax
as an open tag, but the pain this has caused is, and continues to be,
very real.

Notice that the lack of interoperability in this area was a problem
even before there were compatibility constraints.  The compatibility
constraints come later and make it harder / more painful to fix the
lack of interoperability.  Browser vendors have seen this story play
out many, many times in different aspects of HTML, JavaScript APIs,
and even HTTP.  That's why we're so passionate about nailing down
error handle behavior.  We want to avoid this pain in the future.

>>>> Use case: Users benefit when there is competition among browser
>>>> vendors.  Without specifying how to handle invalid messages, new user
>>>> agents need to reverse engineer the behavior of existing user agents,
>>>> making it more difficult to compete in the marketplace.
>>>
>>> My tests show that no, they do not need to compete.
>>
>> Of course user agents need to compete with each other for market
>> share.  Your statement doesn't make any sense.
>
> So you claim that user agents need to process invalid headers for market
> share. But they do not do the same thing. In the absence of a better
> explanation, my conclusion is that it may not matter.

It does matter.  Please see above.

>> In that case, perhaps we should make including the % character invalid
>> as the results are not predictable from the point of view of servers.
>
> We currently have a warning. I would *love* to tell server implementors
> about a way to workaround this problem, but there isn't (for released
> versions of Chrome and IE). So what do you propose to tell senders? "%xx" is
> not allowed in filenames?

Yes, just like we tell them that U+2665 isn't allowed.

Adam
Received on Sunday, 7 November 2010 22:00:37 UTC