Re: Content-Disposition next steps from Mark Nottingham on 2010-12-02 (ietf-http-wg@w3.org from October to December 2010)

From: Mark Nottingham <mnot@mnot.net>
Date: Thu, 2 Dec 2010 11:41:08 +1100
To: Adam Barth <ietf@adambarth.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <75BA519D-EC95-4EDB-9ACF-8919D1190EC4@mnot.net>
Hi Adam,

Thanks. 

I think this is a workable starting point for a definition of an optional algorithm for parsing the header + handling errors encountered in that process. In that way, it could be similar to other RFC appendices that give example code for parsing; the only difference here being that it's example psuedo-code.

A few things need to happen next:

* We need to assure that it doesn't conflict with the rest of the C-D spec, adjusting either it or the spec as necessary, and documenting where we don't have interop. Based on the discussion so far with Julian and Bjoern, it seems that's under way.

* We need to get other UA vendors on board; just having it reflect Chrome's behaviour isn't productive. Many are on-list, but I'll ping those I know to make sure they're aware. Please talk to those you know and make sure they know it's important that we have their input / buy-in here.

* One way or another, I'd like to get the C-D draft submitted for IETF LC by the holidays. If we can get this appendix hammered out by then, we can include it; if not, we can work on it as a separate document. 

To help move us along, it might be good to get the draft text somewhere where it can be collaboratively edited and viewed. How about on the WG Wiki?

Regards,



On 02/12/2010, at 6:50 AM, Adam Barth wrote:

> On Wed, Dec 1, 2010 at 3:12 AM, Mark Nottingham <mnot@mnot.net> wrote:
>> Adam, do you have a proposal?
> 
> Yeah.  Please find my proposal below.  It's certainly not beautiful,
> and it likely needs more polish, but it should be a starting point.
> 
> I tried to be as "gramatical" as I could, but couldn't quite figure
> out how avoid all the algorithmic aspects.  The proposal is based on
> what Chrome does, but cleaned up slightly.  There's some sadness I
> couldn't quite figure out how to avoid, but I'm certainly open to
> talking about it more.
> 
> The rules for determining the disposition-type are particularly goofy.
> I wanted to do more homework to figure how if we can make those more
> aesthetic, but I ran out of time.
> 
> One of the ground rules was that my proposal should only differ from
> the current draft in error-handling cases.  I believe that's the case,
> but I'm not 100% sure.  Please let me know if I've screwed that up.
> 
> Adam
> 
> 
> == Extracting Parameter Values From Header Fields ==
> 
> To extract the value for a given parameter-name from an unparsed-string, parse
> the unparsed-string using the following grammar:
> 
>  unparsed-string = *CHAR name *LWS "=" value [ ";" *CHAR ]
>  value           = <CHAR, except ";">
> 
> where the name production is a gramatical production that is a case-insensitive
> match for the given parameter-name.  If the unparsed-string can be parsed by
> the grammar in multple ways, choose the one in which name appears as close to
> the beginning of the string as possible.  If the unparsed-string cannot be
> parsed by the grammar above, return the empty string.
> 
> 
> == Decoding the File Name ==
> 
> To filename-decode an encoded-string, parse the encoded-string using the
> following grammar:
> 
>  encoded-string = word *( 1*delimiter word )
>  delimiter      = LWS
>  word           = <CHAR, except delimiter>
> 
> Consider each gramatical element (either a delimiter or a word) in the order
> they appear in the encoded-string:
> 
>  1) If the gramatical element is a delimiter, process the element as follows:
> 
>       a) If the previous gramatical element was an RFC2047-value, ignore this
>          gramatical element.
> 
>       b) Otherwise, emit a SP character.
> 
>  2) If the gramatical element is a word, process the element as follows:
> 
>       a) If the word contains non-ASCII characters, process the element as
>          follows:
> 
>            i)  If the word is a well-formed UTF-8 string, emit the word
>                (decoded as UTF-8) and proceed to the next grammatical element.
> 
>            ii) Otherwise, *sadness*.  Apparently what we're supposed to do
>                here is to use the "referrer" charset, if we have one.
>                Otherwse, we fall back to the OS codepage.
> 
>        b) If the word is an RFC2047-value, emit the RFC2047 decoding of the
>           word and proceed to the next grammatical element.
> 
>        c) Let the url-unescaped-word be the word %-unescaped.
> 
>        d) Emit the url-unescaped-word (decoded as UTF-8) and proceed to the
>           next grammatical element.  (There's actually more sadness here if
>           the url-unescaped-word isn't valid UTF-8.)
> 
> The emitted characters are the decoded file name.
> 
> 
> == Determining the File Name ==
> 
> To determine the file name indicated by a Content-Disposition header field, use
> the following algorithm:
> 
>  1) Let filename-star be the value extracted from the Content-Disposition
>     header field for for the "filename*" parameter.
> 
>  2) If filename-star parses as a RFC5987-value, return the RFC5987-value of
>     filename-star and abort these steps.
> 
>  3) Let filename be the value extracted from the Content-Disposition header
>     field for the "filename" parameter.
> 
>  4) If filename is empty, instead let filename be the value extracted from the
>     Content-Disposition header field for the "name" parameter.
> 
>  5) If filename is empty, return the empty string and abort these steps.
> 
>  6) Return the filename-decoding of filename.
> 
> 
> == Determining the Disposition ==
> 
> To determine the disposition-type, parse the Content-Disposition
> header field using
> the following grammar:
> 
>  unparsed-string  = *LWS nominal-type *CHAR
>  nominal-type = "inline" / "filename" / "name" / ";"
> 
> If the Content-Disposition header field parser fails to parse, then the
> disposition type is "attachment".  Otherwise, the disposition-type is "inline".
> 
> 
> == Processing the Content-Disposition Header Field ==
> 
> To process the Content-Disposition header field, use the following algorithm:
> 
>  1) Determine the disposition-type.
> 
>  2) If the disposition-type is "inline", then ...
> 
>  3) If the disposition-type is "attachment", then let filename be the file
>     name indicated by the header field.  ...

--
Mark Nottingham   http://www.mnot.net/
Received on Thursday, 2 December 2010 00:41:41 UTC