Re: Content-Disposition next steps from Adam Barth on 2010-12-01 (ietf-http-wg@w3.org from October to December 2010)

From: Adam Barth <ietf@adambarth.com>
Date: Wed, 1 Dec 2010 11:50:23 -0800
To: Mark Nottingham <mnot@mnot.net>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <AANLkTinTvXxSHaNq4jrdCKFaTztWwz2AVPywmnQ0Y0Z=@mail.gmail.com>
On Wed, Dec 1, 2010 at 3:12 AM, Mark Nottingham <mnot@mnot.net> wrote:
> Adam, do you have a proposal?

Yeah.  Please find my proposal below.  It's certainly not beautiful,
and it likely needs more polish, but it should be a starting point.

I tried to be as "gramatical" as I could, but couldn't quite figure
out how avoid all the algorithmic aspects.  The proposal is based on
what Chrome does, but cleaned up slightly.  There's some sadness I
couldn't quite figure out how to avoid, but I'm certainly open to
talking about it more.

The rules for determining the disposition-type are particularly goofy.
 I wanted to do more homework to figure how if we can make those more
aesthetic, but I ran out of time.

One of the ground rules was that my proposal should only differ from
the current draft in error-handling cases.  I believe that's the case,
but I'm not 100% sure.  Please let me know if I've screwed that up.

Adam


== Extracting Parameter Values From Header Fields ==

To extract the value for a given parameter-name from an unparsed-string, parse
the unparsed-string using the following grammar:

  unparsed-string = *CHAR name *LWS "=" value [ ";" *CHAR ]
  value           = <CHAR, except ";">

where the name production is a gramatical production that is a case-insensitive
match for the given parameter-name.  If the unparsed-string can be parsed by
the grammar in multple ways, choose the one in which name appears as close to
the beginning of the string as possible.  If the unparsed-string cannot be
parsed by the grammar above, return the empty string.


== Decoding the File Name ==

To filename-decode an encoded-string, parse the encoded-string using the
following grammar:

  encoded-string = word *( 1*delimiter word )
  delimiter      = LWS
  word           = <CHAR, except delimiter>

Consider each gramatical element (either a delimiter or a word) in the order
they appear in the encoded-string:

  1) If the gramatical element is a delimiter, process the element as follows:

       a) If the previous gramatical element was an RFC2047-value, ignore this
          gramatical element.

       b) Otherwise, emit a SP character.

  2) If the gramatical element is a word, process the element as follows:

       a) If the word contains non-ASCII characters, process the element as
          follows:

            i)  If the word is a well-formed UTF-8 string, emit the word
                (decoded as UTF-8) and proceed to the next grammatical element.

            ii) Otherwise, *sadness*.  Apparently what we're supposed to do
                here is to use the "referrer" charset, if we have one.
                Otherwse, we fall back to the OS codepage.

        b) If the word is an RFC2047-value, emit the RFC2047 decoding of the
           word and proceed to the next grammatical element.

        c) Let the url-unescaped-word be the word %-unescaped.

        d) Emit the url-unescaped-word (decoded as UTF-8) and proceed to the
           next grammatical element.  (There's actually more sadness here if
           the url-unescaped-word isn't valid UTF-8.)

The emitted characters are the decoded file name.


== Determining the File Name ==

To determine the file name indicated by a Content-Disposition header field, use
the following algorithm:

  1) Let filename-star be the value extracted from the Content-Disposition
     header field for for the "filename*" parameter.

  2) If filename-star parses as a RFC5987-value, return the RFC5987-value of
     filename-star and abort these steps.

  3) Let filename be the value extracted from the Content-Disposition header
     field for the "filename" parameter.

  4) If filename is empty, instead let filename be the value extracted from the
     Content-Disposition header field for the "name" parameter.

  5) If filename is empty, return the empty string and abort these steps.

  6) Return the filename-decoding of filename.


== Determining the Disposition ==

To determine the disposition-type, parse the Content-Disposition
header field using
the following grammar:

  unparsed-string  = *LWS nominal-type *CHAR
  nominal-type = "inline" / "filename" / "name" / ";"

If the Content-Disposition header field parser fails to parse, then the
disposition type is "attachment".  Otherwise, the disposition-type is "inline".


== Processing the Content-Disposition Header Field ==

To process the Content-Disposition header field, use the following algorithm:

  1) Determine the disposition-type.

  2) If the disposition-type is "inline", then ...

  3) If the disposition-type is "attachment", then let filename be the file
     name indicated by the header field.  ...
Received on Wednesday, 1 December 2010 19:51:46 UTC