Re: Proposal for i111 / i63

There seemed to be implicit agreement with this approach when we were  
discussing these parts of i74, but nothing since they were split away  
from the more difficult parts of that issue.

In a nutshell, this is just replacing the blanket RFC2047 encoding  
declaration with targeted use of the encoded-word rule, while removing  
the length limit from it.

Comments?


On 03/04/2008, at 6:29 PM, Mark Nottingham wrote:

>
> This is the parts of my revised proposal for i74 that are specific  
> to i111 and i63. I've mostly used HTTP-style BNF, not ABNF, for  
> purposes of comparison.
>
> * p1, 2.2:
> Old:
>> The TEXT rule is only used for descriptive field contents and values
>> that are not intended to be interpreted by the message parser.  Words
>> of *TEXT MAY contain characters from character sets other than ISO-
>> 8859-1 [ISO-8859-1] only when encoded according to the rules of
>> [RFC2047].
>>   TEXT           = %x20-7E | %x80-FF | LWS
>>                  ; any OCTET except CTLs, but including LWS
>> A CRLF is allowed in the definition of TEXT only as part of a header
>> field continuation.  It is expected that the folding LWS will be
>> replaced with a single SP before interpretation of the TEXT value.
>
> New:
> """
> Words of *TEXT MUST NOT contain characters from character sets other  
> than ISO-8859-1 [ISO-8859-1].
>
>   TEXT           = %x20-7E | %x80-FF | LWS
>                  ; any OCTET except CTLs, but including LWS
>
> A CRLF is allowed in the definition of TEXT only as part of a header  
> field continuation.  It is expected that the folding LWS will be  
> replaced with a single SP before interpretation of the TEXT value.
>
> Characters outside of ISO8859-1 MAY be included where the encoded- 
> word rule (as defined in RFC2047, Section 2) is specified. The  
> encoded-word rule is only used for descriptive field contents and  
> values that are not intended to be interpreted by the message  
> parser. When used in HTTP, encoded-word has no specified length limit.
> """
>
> Note that I've taken a minimal approach to #63 here, and that the  
> outcome of i74 may change this.
>
>
> * p1, 2.2:
> Old:
>> comment = "(" *( ctext | quoted-pair | comment ) ")"
>
> New:
> """
> comment = "(" *( ctext | quoted-pair | comment | encoded-word ) ")"
> """
>
>
> * p1, 4.2:
> Old:
>>   field-content  = <field content>
>>                    ; the OCTETs making up the field-value
>>                    ; and consisting of either *TEXT or combinations
>>                    ; of token, separators, and quoted-string
>
> New:
> """
> field-content = <field content>
> ; the OCTETs making up the field-value,
> ; according to the syntax specified by the field.
> """
>
> N.B. depending on how we resolve i74, we may want to add a  
> constraint regarding character encodings, so that people don't start  
> minting headers in random ones.
>
>
> * p3, B.1:
> Old:
>> filename-parm = "filename" "=" quoted-string
>
> New:
> """
> filename-parm = "filename" "=" quoted-string | encoded-word
> """
>
> N.B.
>
>
> * p6, 16.6:
> Old:
>> warn-text = quoted-string
> New:
> """
> warn-text = quoted-string | encoded-word
> """
>
>
> Note that I have NOT suggested the use of encoded-word in the  
> following places:
>
> p1, 3.4 (Transfer Codings -- parameter values), p1, 6.1.1 (Reason- 
> Phrase), p2, 10.2 (expect-extensions), p3, 3.3 (Media Types --  
> parameter values), p3, 6.1 (accept-extension), p4, 3 (ETag opaque- 
> tag), p6, 16.2 (cache-extension), p6, 16.4 (extension-pragma).
>
> I think the *-extension and parameter value ones are  
> straightforward; if a particular extension wants to specify use of  
> encoded-word, it should; we shouldn't specify use of encoded-word in  
> the generic extension construct, but leave it to the specific  
> instances. I.e., they still conform to TEXT, it's up to them to  
> specify if that content can contain encoded-words.


--
Mark Nottingham     http://www.mnot.net/

Received on Thursday, 17 April 2008 02:16:55 UTC