Comments on draft-ietf-httpbis-content-disp from Adam Barth on 2010-11-01 (ietf-http-wg@w3.org from October to December 2010)

From: Adam Barth <ietf@adambarth.com>
Date: Mon, 1 Nov 2010 01:30:26 -0700
To: httpbis <ietf-http-wg@w3.org>
Message-ID: <AANLkTin2KkDJyMpdw0BwkPG4iVwSfzJ1bipC3qKHpa70@mail.gmail.com>
== Disclaimers ==

1) I'm aware that there are more implementors of user agents than
browsers.  I'm not interested in being reminded of that fact.  Browser
user agents, however, are one important group of user agents.

2) I'm aware that this working group does not share my perspective on
what constitutes a useful specification for user agent implementors.
I'm not interested in discussing whether the level of precision I'm
requesting is valuable.

3) I'm aware that this document reflects business-as-usual in the
IETF.  My position is that business-as-usual does not meet the needs
of browser user agent implementors, largely because browser user agent
implements have been effectively absent from the IETF process for the
better part of a decade.

== Comments ==

The comments below are relative to
http://tools.ietf.org/html/draft-ietf-httpbis-content-disp-03, which
is the most recent version I could find on the IETF web site.

http://tools.ietf.org/html/draft-ietf-httpbis-content-disp-03#section-3.1

This section defines a grammar for the Content-Disposition header
field.  However, the document does not define how a user agent should
interpret Content-Disposition header fields that do not conform to
this grammar.  To foster interoperability between user agent
implementations, the document should define how user agents are to
process every sequence of bytes they could receive in a
Content-Disposition header field.

=> Parameter names MUST NOT be repeated.

The document should not phrase normative requirements in the passive
voice.  Instead, the document should make clear which protocol
partipants are bound by each requirement.  For example, this
requirement probably should read "servers MUST NOT generate
Content-Disposition header field values with multiple instances of the
same parameter name."

=> a header field value with multiple instances of the same parameter
SHOULD be treated as invalid.

Similarly, this requirement probably should read "user agents SHOULD
treat a header field value with multiple instances of the same
paramater as invalid."  Furthermore, the document should define what
treating a header field value as invalid means.  Presumably the author
intends that user agents ought to ignore such header field values.
I'm skeptical that is the optimum behavior for user agents.  I would
have expected user agents to either use the first or the last instance
of each paramater.

http://tools.ietf.org/html/draft-ietf-httpbis-content-disp-03#section-3.2

This section does not define how user agents ought to process header
field values with multiple disposition types.  According to this test
case <http://greenbytes.de/tech/tc2231/#attandinline2>, user agents
MUST use the first disposition type.

http://tools.ietf.org/html/draft-ietf-httpbis-content-disp-03#section-3.3

This section provides very little guidance about how to extract a file
name from the filename parameter.  For example, it fails to instruct
the user agent about how to handle the following test cases:

http://greenbytes.de/tech/tc2231/#attwithasciifnescapedquote
http://greenbytes.de/tech/tc2231/#attwithasciifilenamenqws
http://greenbytes.de/tech/tc2231/#attwithutf8fnplain
http://greenbytes.de/tech/tc2231/#attwithfnrawpctenca
http://greenbytes.de/tech/tc2231/#attwith2filenames
http://greenbytes.de/tech/tc2231/#attfnbrokentoken
http://greenbytes.de/tech/tc2231/#attbrokenquotedfn

In particular, this document should define an algorithm that takes as
input a sequence of bytes obtained by parsing the Content-Disposition
header field value and returns a sequence of characters which is the
file name requested by the server.  The user agent, of course, can
then treat this requested file name as advisory (e.g., altering the
file name according to platform-specific constraints).

The document defines filename* by referring to RFC5987, but RFC5987
does not define a precise algorithm for computing the file name from a
sequence of input bytes.

http://code.google.com/p/chromium/issues/detail?id=57830

Jungshik Shin writes:

[[
As for RFC 5987, I'm aware that it's a profile of RFC 2231 (it's good
that it's simpler than the full RFC 2231), but I wrote that it's
unnecessarily 'complex' and not many web servers would adopt that
anytime soon. That's why I advocated a much simpler approach of using
(percent-encoded) UTF-8. I'm aware that it has its own share of
issues, but I suspect that it's got a better chance of being adopted
by web servers.
]]

I agree with his assessment.  We should simply use percent-encoded
UTF-8 instead of letting the server specify whatever crazy encoding it
dreams up.  Also, we should remove the language tagging facility
because it is gratuitous.

http://tools.ietf.org/html/draft-ietf-httpbis-content-disp-03#appendix-C.2

As far as I can tell, this is actually the biggest interoperability
problem with the Content-Disposition header field.  Unfortunately,
this document does nothing to resolve this issue.  I recommend that
this document take a position with respect to how to handle
percent-encoded values in the filename parameter.  Specifically, I
recommend that the document instruct user agents to decode percent
encoded values using the user's preferred encoding.  Yes, that's ugly,
but it's the way Content-Disposition works in the real world and the
most likely requirement to actually be implemented by user agents.

At a higher level, this document does not define the
Content-Disposition header field in sufficient detail for user agent
implements to implement the header field in a manner that
interoperates with each other or with existing servers.  Specifically,
the document does not define an algorithm for parsing the
Content-Disposition header field value, nor does it define an
algorithm for computing the requested file name from a parsed header
field value.  The document does not tackle the biggest
interoperability issue with current user agents.

In short, this document does not address the needs of browser user
agent implementors.  This objection can be resolved in two ways:

1) The document can be modified to precisely define the behavior of user agents.

2) The document can be modified to clearly state that it defines a
profile of the Content-Disposition header field for use by servers
(with the definition of the user agent behavior to follow in some
future document).

Adam
Received on Monday, 1 November 2010 08:31:37 UTC