- From: Brian Smith <brian@briansmith.org>
- Date: Wed, 16 Apr 2008 20:20:30 -0700
- To: "'Mark Nottingham'" <mnot@mnot.net>, "'HTTP Working Group'" <ietf-http-wg@w3.org>
Mark Nottingham wrote: > > Characters outside of ISO8859-1 MAY be included where the encoded- > > word rule (as defined in RFC2047, Section 2) is specified. The > > encoded-word rule is only used for descriptive field contents and > > values that are not intended to be interpreted by the message > > parser. When used in HTTP, encoded-word has no specified > length limit. RFC2047 without the length limit is not RFC2047 any more. Besides, the length limit is not the only RFC2047-imposed limit you would have to remove, as explained below. > > * p1, 2.2: > > Old: > >> comment = "(" *( ctext | quoted-pair | comment ) ")" > > > > New: > > """ > > comment = "(" *( ctext | quoted-pair | comment | encoded-word ) ")" > > """ ctext matches everything that encoded-word matches, so the given grammar is ambiguous. Further, comments are not interpreted by applications, so I see little point in defining encoding/decoding for them. In fact, the specification should be explicitly recommending that we avoid producing comments as they are not properly supported by many deployed implementations. I recommend keeping the old rule as it was. > > * p1, 4.2: > > Old: > >> field-content = <field content> > >> ; the OCTETs making up the field-value > >> ; and consisting of either *TEXT or combinations > >> ; of token, separators, and quoted-string > > > > New: > > """ > > field-content = <field content> > > ; the OCTETs making up the field-value, > > ; according to the syntax specified by the field. > > """ A parser should be able to parse field-content without knowing the syntax of any field. In particular, the field-content rule is needed to parse unknown headers. It should be specified by real machine-processable ABNF. > > * p3, B.1: > > Old: > >> filename-parm = "filename" "=" quoted-string > > > > New: > > """ > > filename-parm = "filename" "=" quoted-string | encoded-word > > """ RFC2047 says "An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'." That is what RFC2231 is for. Also, this is not a backward-compatible change. Different products use differing syntaxes for the filename parameter, and nobody is using RFC2047. See http://lists.w3.org/Archives/Public/public-html/2008Mar/0113.html. I suggest to make a separate issue for this. > > * p6, 16.6: > > Old: > >> warn-text = quoted-string > > New: > > """ > > warn-text = quoted-string | encoded-word > > """ Again, RFC2047 says it cannot be used in a structured field body, and this is not a backward-compatible change anyway. Nobody has mentioned any existing implementations that do this. > > I think the *-extension and parameter value ones are > > straightforward; if a particular extension wants to specify use of > > encoded-word, it should; we shouldn't specify use of > encoded-word in > > the generic extension construct, but leave it to the specific > > instances. I.e., they still conform to TEXT, it's up to them to > > specify if that content can contain encoded-words. Again, that would not be a backward-compatible change because the RFC2616 grammar explicitly says "token | quoted-string" for values, and that is precisely what existing implementations expect. Like I said before, although RFC 2616 vaguely stated that RFC 2047 could be used, the actual BNF productions in the grammar did not allow it anywhere. Attempting to RFC 2047 encoding anywhere now would be an incompatible change that deployed implementations are not likely to handle well. In order to maximize backward-compatibility, it is likely that a new syntax for UTF-8 support is needed; this new syntax will have to parsable as a quoted-string by applications that do not understand it. (RFC2047 encoding does not have that property, because RFC2047 encoding is not quoted; in fact, quoting encoded-words is explicitly forbidden by RFC2047.) Accordingly, any references to RFC 2047 and anything it defines should just be removed. - Brian
Received on Thursday, 17 April 2008 03:21:04 UTC