new BNF-TEXT issue: use of CRLF in TEXT

During a discussion with Jim and Dave about the note at the end of
section 2.2, I realized that an important part of the field parsing
algorithm is only stated indirectly in the HTTP/1.1 spec.

Section 2.2:

   HTTP/1.1 headers can be folded onto multiple lines if the continuation
   line begins with a space or horizontal tab. All linear white space,
   including folding, has the same semantics as SP.

       LWS            = [CRLF] 1*( SP | HT )

   The TEXT rule is only used for descriptive field contents and values
   that are not intended to be interpreted by the message parser. Words of
   *TEXT MAY contain characters from character sets other than ISO 8859-1
   [22] only when encoded according to the rules of RFC 2047 [14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

where it should say

   HTTP/1.1 header field values can be folded onto multiple lines if the
   continuation line begins with a space or horizontal tab.  All linear
   white space, including folding, has the same semantics as SP.
   A recipient MAY replace any linear white space with a single SP before
   interpreting the field value or forwarding the message downstream.

       LWS            = [CRLF] 1*( SP | HT )

   The TEXT rule is only used for descriptive field contents and values
   that are not intended to be interpreted by the message parser. Words of
   *TEXT MAY contain characters from character sets other than ISO 8859-1
   [22] only when encoded according to the rules of RFC 2047 [14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

   A CRLF is allowed in the definition of TEXT only as part of a
   header field continuation.  It is expected that the folding LWS will
   be replaced with a single SP before interpretation of the TEXT value.

And the note at the end of section 2.2:

   Note: CRLF in a quoted string is legal, but only in a strange way:
   as part of a header continuation, as in

      "part of
      a
      quoted-string".

   This is strange, and CRLF's ought to be allowed, but backward
   compatibility constraints mean that they are not allowed in
   general.

needs to be deleted because it is wrong.

Also, for good measure, the following should be added to section 4.2
just after the BNF definition of field-content.

   The field-content does not include any leading or trailing LWS:
   linear white space occurring before the first non-whitespace
   character of the field-value or after the last non-whitespace
   character of the field-value.  Such leading or trailing LWS
   MAY be removed without changing the semantics of the field value.
   Any LWS that occurs between field-content MAY be replaced with
   a single SP before interpreting the field value or forwarding
   the message downstream.
   
....Roy

Received on Friday, 4 September 1998 17:57:34 UTC