RE: Proposal for i111 / i63 from Brian Smith on 2008-04-17 (ietf-http-wg@w3.org from April to June 2008)

From: Brian Smith <brian@briansmith.org>
Date: Wed, 16 Apr 2008 20:20:30 -0700
To: "'Mark Nottingham'" <mnot@mnot.net>, "'HTTP Working Group'" <ietf-http-wg@w3.org>
Message-ID: <004601c8a039$ff672930$0302a8c0@T60>
Mark Nottingham wrote:
> > Characters outside of ISO8859-1 MAY be included where the encoded- 
> > word rule (as defined in RFC2047, Section 2) is specified. The  
> > encoded-word rule is only used for descriptive field contents and  
> > values that are not intended to be interpreted by the message  
> > parser. When used in HTTP, encoded-word has no specified 
> length limit.

RFC2047 without the length limit is not RFC2047 any more. Besides, the
length limit is not the only RFC2047-imposed limit you would have to
remove, as explained below.

> > * p1, 2.2:
> > Old:
> >> comment = "(" *( ctext | quoted-pair | comment ) ")"
> >
> > New:
> > """
> > comment = "(" *( ctext | quoted-pair | comment | encoded-word ) ")"
> > """

ctext matches everything that encoded-word matches, so the given grammar
is ambiguous. Further, comments are not interpreted by applications, so
I see little point in defining encoding/decoding for them. In fact, the
specification should be explicitly recommending that we avoid producing
comments as they are not properly supported by many deployed
implementations. I recommend keeping the old rule as it was.

> > * p1, 4.2:
> > Old:
> >>   field-content  = <field content>
> >>                    ; the OCTETs making up the field-value
> >>                    ; and consisting of either *TEXT or combinations
> >>                    ; of token, separators, and quoted-string
> >
> > New:
> > """
> > field-content = <field content>
> > ; the OCTETs making up the field-value,
> > ; according to the syntax specified by the field.
> > """

A parser should be able to parse field-content without knowing the
syntax of any field. In particular, the field-content rule is needed to
parse unknown headers. It should be specified by real
machine-processable ABNF.

> > * p3, B.1:
> > Old:
> >> filename-parm = "filename" "=" quoted-string
> >
> > New:
> > """
> > filename-parm = "filename" "=" quoted-string | encoded-word
> > """

RFC2047 says "An 'encoded-word' MUST NOT be used in parameter of a MIME
Content-Type or Content-Disposition field, or in any structured field
body except within a 'comment' or 'phrase'." That is what RFC2231 is
for. Also, this is not a backward-compatible change. Different products
use differing syntaxes for the filename parameter, and nobody is using
RFC2047. See
http://lists.w3.org/Archives/Public/public-html/2008Mar/0113.html. I
suggest to make a separate issue for this.

> > * p6, 16.6:
> > Old:
> >> warn-text = quoted-string
> > New:
> > """
> > warn-text = quoted-string | encoded-word
> > """

Again, RFC2047 says it cannot be used in a structured field body, and
this is not a backward-compatible change anyway. Nobody has mentioned
any existing implementations that do this. 

> > I think the *-extension and parameter value ones are  
> > straightforward; if a particular extension wants to specify use of  
> > encoded-word, it should; we shouldn't specify use of 
> encoded-word in  
> > the generic extension construct, but leave it to the specific  
> > instances. I.e., they still conform to TEXT, it's up to them to  
> > specify if that content can contain encoded-words.

Again, that would not be a backward-compatible change because the
RFC2616 grammar explicitly says "token | quoted-string" for values, and
that is precisely what existing implementations expect.

Like I said before, although RFC 2616 vaguely stated that RFC 2047 could
be used, the actual BNF productions in the grammar did not allow it
anywhere. Attempting to RFC 2047 encoding anywhere now would be an
incompatible change that deployed implementations are not likely to
handle well. In order to maximize backward-compatibility, it is likely
that a new syntax for UTF-8 support is needed; this new syntax will have
to parsable as a quoted-string by applications that do not understand
it. (RFC2047 encoding does not have that property, because RFC2047
encoding is not quoted; in fact, quoting encoded-words is explicitly
forbidden by RFC2047.) 

Accordingly, any references to RFC 2047 and anything it defines should
just be removed.

- Brian
Received on Thursday, 17 April 2008 03:21:04 UTC