i74 proposal take 2 from Mark Nottingham on 2008-03-28 (ietf-http-wg@w3.org from January to March 2008)

From: Mark Nottingham <mnot@mnot.net>
Date: Fri, 28 Mar 2008 12:19:14 +1100
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <FE13F790-16A0-4002-B752-95719CCE80D3@mnot.net>
* p1, 2.2:
Old:
>   The TEXT rule is only used for descriptive field contents and values
>   that are not intended to be interpreted by the message parser.   
> Words
>   of *TEXT MAY contain characters from character sets other than ISO-
>   8859-1 [ISO-8859-1] only when encoded according to the rules of
>   [RFC2047].
>     TEXT           = %x20-7E | %x80-FF | LWS
>                    ; any OCTET except CTLs, but including LWS
>   A CRLF is allowed in the definition of TEXT only as part of a header
>   field continuation.  It is expected that the folding LWS will be
>   replaced with a single SP before interpretation of the TEXT value.


New:
"""
Words of *TEXT MUST NOT contain characters from character sets other  
than ISO-8859-1 [ISO-8859-1].

     TEXT           = %x20-7E | %x80-FF | LWS
                    ; any OCTET except CTLs, but including LWS

A CRLF is allowed in the definition of TEXT only as part of a header  
field continuation.  It is expected that the folding LWS will be  
replaced with a single SP before interpretation of the TEXT value.

Characters outside of ISO8859-1 MAY be included where the encoded-word  
rule (as defined in RFC2047, Section 2) is specified. The encoded-word  
rule is only used for descriptive field contents and values that are  
not intended to be interpreted by the message parser. When used in  
HTTP, encoded-word has no specified length limit.
"""

One question to consider here -- should %x80-%x9F be included in TEXT?  
They don't fall into the syntactic definition of CTLs in 2616, but the  
are semantically control characters, AFAIK.


* p1, 2.2:
Old:
> comment = "(" *( ctext | quoted-pair | comment ) ")"


New:
"""
comment = "(" *( ctext | quoted-pair | comment | encoded-word ) ")"
"""


* p1, 4.2:
Old:
>     field-content  = <field content>
>                      ; the OCTETs making up the field-value
>                      ; and consisting of either *TEXT or combinations
>                      ; of token, separators, and quoted-string

New:
"""
field-content = <field content>
	; the OCTETs making up the field-value,
	; consisting of either *TEXT or combinations
	; of token, separators, quoted-string and encoded-word,
	; according to the syntax specified by the field.
"""


* p3, B.1:
Old:
> filename-parm = "filename" "=" quoted-string

New:
"""
filename-parm = "filename" "=" quoted-string | encoded-word
"""


* p6, 16.6:
Old:
> warn-text = quoted-string
New:
"""
warn-text = quoted-string | encoded-word
"""


Note that I have NOT suggested the use of encoded-word in the  
following places:

p1, 3.4 (Transfer Codings -- parameter values), p1, 6.1.1 (Reason- 
Phrase), p2, 10.2 (expect-extensions), p3, 3.3 (Media Types --  
parameter values), p3, 6.1 (accept-extension), p4, 3 (ETag opaque- 
tag), p6, 16.2 (cache-extension), p6, 16.4 (extension-pragma).

I think the *-extension and parameter value ones are straightforward;  
if a particular extension wants to specify use of encoded-word, it  
should; we shouldn't specify use of encoded-word in the generic  
extension construct, but leave it to the specific instances.

I don't see a use case for ETags being internationalised -- does  
anyone else? Reason-Phrase may be necessary, though.

Also, I haven't addressed From (p2, 10.3). Anybody want to take a stab  
at that?

Cheers,

--
Mark Nottingham     http://www.mnot.net/
Received on Friday, 28 March 2008 01:19:55 UTC