Re: i74 proposal take 2

Mark Nottingham wrote:
> 
> * p1, 2.2:
> Old:
>>   The TEXT rule is only used for descriptive field contents and values
>>   that are not intended to be interpreted by the message parser.  Words
>>   of *TEXT MAY contain characters from character sets other than ISO-
>>   8859-1 [ISO-8859-1] only when encoded according to the rules of
>>   [RFC2047].
>>     TEXT           = %x20-7E | %x80-FF | LWS
>>                    ; any OCTET except CTLs, but including LWS
>>   A CRLF is allowed in the definition of TEXT only as part of a header
>>   field continuation.  It is expected that the folding LWS will be
>>   replaced with a single SP before interpretation of the TEXT value.
> 
> 
> New:
> """
> Words of *TEXT MUST NOT contain characters from character sets other 
> than ISO-8859-1 [ISO-8859-1].
> 
>     TEXT           = %x20-7E | %x80-FF | LWS
>                    ; any OCTET except CTLs, but including LWS
> 
> A CRLF is allowed in the definition of TEXT only as part of a header 
> field continuation.  It is expected that the folding LWS will be 
> replaced with a single SP before interpretation of the TEXT value.
> 
> Characters outside of ISO8859-1 MAY be included where the encoded-word 
> rule (as defined in RFC2047, Section 2) is specified. The encoded-word 
> rule is only used for descriptive field contents and values that are not 
> intended to be interpreted by the message parser. When used in HTTP, 
> encoded-word has no specified length limit.
> """
> 
> One question to consider here -- should %x80-%x9F be included in TEXT? 
> They don't fall into the syntactic definition of CTLs in 2616, but the 
> are semantically control characters, AFAIK.

I think it would be the right thing to forbid them.

> * p1, 2.2:
> Old:
>> comment = "(" *( ctext | quoted-pair | comment ) ")"
> 
> 
> New:
> """
> comment = "(" *( ctext | quoted-pair | comment | encoded-word ) ")"
> """

OK, but then we'll have to state somewhere where encoded-word comes 
from; <http://tools.ietf.org/html/rfc2047#section-2>?

Also, do we really Really REALLY want to require to support all what's 
in there?

> * p1, 4.2:
> Old:
>>     field-content  = <field content>
>>                      ; the OCTETs making up the field-value
>>                      ; and consisting of either *TEXT or combinations
>>                      ; of token, separators, and quoted-string
> 
> New:
> """
> field-content = <field content>
>     ; the OCTETs making up the field-value,
>     ; consisting of either *TEXT or combinations
>     ; of token, separators, quoted-string and encoded-word,
>     ; according to the syntax specified by the field.
> """

I prefer the simplified version you posted later:

field-content = <field content>
   ; the OCTETs making up the field-value,
   ; according to the syntax specified by the field.


> * p3, B.1:
> Old:
>> filename-parm = "filename" "=" quoted-string
> 
> New:
> """
> filename-parm = "filename" "=" quoted-string | encoded-word
> """

I'd prefer to make C-D a special case where we specify *exactly* what's 
needed, nothing more (which means: RFC2231 encoding of utf-8, no line 
folding/contiuation lines).

> * p6, 16.6:
> Old:
>> warn-text = quoted-string
> New:
> """
> warn-text = quoted-string | encoded-word
> """
> 
> 
> Note that I have NOT suggested the use of encoded-word in the following 
> places:
> 
> p1, 3.4 (Transfer Codings -- parameter values), p1, 6.1.1 
> (Reason-Phrase), p2, 10.2 (expect-extensions), p3, 3.3 (Media Types -- 
> parameter values), p3, 6.1 (accept-extension), p4, 3 (ETag opaque-tag), 
> p6, 16.2 (cache-extension), p6, 16.4 (extension-pragma).

OK, except probably for Reason-Phrase.

> I think the *-extension and parameter value ones are straightforward; if 
> a particular extension wants to specify use of encoded-word, it should; 
> we shouldn't specify use of encoded-word in the generic extension 
> construct, but leave it to the specific instances.

Yes.

> I don't see a use case for ETags being internationalised -- does anyone 
> else? Reason-Phrase may be necessary, though.

No. Yes.

> Also, I haven't addressed From (p2, 10.3). Anybody want to take a stab 
> at that?

BR, Julian

Received on Friday, 28 March 2008 19:18:40 UTC