Re: Content-Disposition next steps from Bjoern Hoehrmann on 2010-12-01 (ietf-http-wg@w3.org from October to December 2010)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 01 Dec 2010 23:32:52 +0100
To: Adam Barth <ietf@adambarth.com>
Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <ucbdf69oa5o15atvhoirl7kso88mc9f2ar@hive.bjoern.hoehrmann.de>
* Adam Barth wrote:
>On Wed, Dec 1, 2010 at 3:12 AM, Mark Nottingham <mnot@mnot.net> wrote:
>> Adam, do you have a proposal?
>
>Yeah.  Please find my proposal below.  It's certainly not beautiful,
>and it likely needs more polish, but it should be a starting point.
>
>I tried to be as "gramatical" as I could, but couldn't quite figure
>out how avoid all the algorithmic aspects.  The proposal is based on
>what Chrome does, but cleaned up slightly.

I would have preferred, instead of dragging the working group through
philosophical debates about browser vendor's needs for two months, you'd
just have said you'd like the specifcation to say the header should be
processed as Chrome processes it, and detail any exceptions. I note that
you are not actually telling us what the differences should be even now.

>The rules for determining the disposition-type are particularly goofy.
> I wanted to do more homework to figure how if we can make those more
>aesthetic, but I ran out of time.

You will excuse if I remind you of your response to concerns about the
lack of resources to work on this, "We have many problems in life, but
a lack of engineering resources isn't one of them". But let's see about
the goofy rules for the disposition type:

>== Determining the Disposition ==
>
>To determine the disposition-type, parse the Content-Disposition
>header field using
>the following grammar:
>
>  unparsed-string  = *LWS nominal-type *CHAR
>  nominal-type = "inline" / "filename" / "name" / ";"
>
>If the Content-Disposition header field parser fails to parse, then the
>disposition type is "attachment".  Otherwise, the disposition-type is "inline".

It is incorrect to specify the *LWS here as the draft uses RFC 2616
implied LWS rules. This also does not do what you intend as *CHAR
does not match what you think it matches. It also mishandles a common
case, the empty string as value, which fails to parse but is handled
pretty much universally as "inline", contrary to your proposal. Not
to mention that it's silly to treat `x=y; filename=example.txt` as if
it had an unrecognized disposition type and should thus be handled as
"attachment", which, say, Internet Explorer and Opera don't do, when
you treat plain `filename=example.txt` as having no disposition type.

>One of the ground rules was that my proposal should only differ from
>the current draft in error-handling cases.  I believe that's the case,
>but I'm not 100% sure.  Please let me know if I've screwed that up.

I can't really make heads or tails of the rest of your proposal, for
instance, if you go by the processing rules already in the draft, you
would not need to discuss quote marks, but you seem to have your own
rules for processing parameters and parameter values, in which case
you would need to discuss quote marks, but your proposal does not.

But the impression I am getting is that you have provisions for both
RFC 2047 encoding and some url-escaping in quoted-string values, both
of which are non-standard interpretations of properly formed headers.
That does not belong in any proposal for error handling, as I said in
http://lists.w3.org/Archives/Public/ietf-http-wg/2010OctDec/0395.html

Let me say again, if the idea behind the i18n behavior is to emulate
certain browsers, we need to know exactly how those browsers behave,
and that includes the configuration of those browsers which may vary
depending on things like the system locale, and better evidence for
specifying behavior that is inconsistent with them than Google being
unaware of bug reports; Chrome needs a much larger market share than
it has at the moment to make the lack of bug reports an argument.

>== Extracting Parameter Values From Header Fields ==
>
>To extract the value for a given parameter-name from an unparsed-string, parse
>the unparsed-string using the following grammar:
>
>  unparsed-string = *CHAR name *LWS "=" value [ ";" *CHAR ]
>  value           = <CHAR, except ";">
>
>where the name production is a gramatical production that is a case-insensitive
>match for the given parameter-name.  If the unparsed-string can be parsed by
>the grammar in multple ways, choose the one in which name appears as close to
>the beginning of the string as possible.  If the unparsed-string cannot be
>parsed by the grammar above, return the empty string.

This does not work as you intend as CHAR is US-ASCII 0x00 through 0x7F
so you would never get to filename parameter values that are not UTF-8.

Not to mention that this is utterly silly, if you have "x<name>=..."
this would be handled as if the value had a `name` parameter with the
empty string as value, as opposed to the semantically correct result,
which would be "there is no `name` parameter". And as you lack higher
level control logic to actually separate parameters, this results in
`example="filename=example.txt"` having the filename `example.txt"`,
as opposed to the correct result, namely that there is no filename.

Mark, please count me as opposed to delay submitting the draft to the
IESG to wait for Adam to come up with a proposal any longer; when it's
finished, it may just aswell be published as a separate specification;
we've been on this for two months now with essentially no progress.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 1 December 2010 22:33:36 UTC