- From: Mark Nottingham <mnot@mnot.net>
- Date: Fri, 28 Mar 2008 12:19:14 +1100
- To: HTTP Working Group <ietf-http-wg@w3.org>
* p1, 2.2: Old: > The TEXT rule is only used for descriptive field contents and values > that are not intended to be interpreted by the message parser. > Words > of *TEXT MAY contain characters from character sets other than ISO- > 8859-1 [ISO-8859-1] only when encoded according to the rules of > [RFC2047]. > TEXT = %x20-7E | %x80-FF | LWS > ; any OCTET except CTLs, but including LWS > A CRLF is allowed in the definition of TEXT only as part of a header > field continuation. It is expected that the folding LWS will be > replaced with a single SP before interpretation of the TEXT value. New: """ Words of *TEXT MUST NOT contain characters from character sets other than ISO-8859-1 [ISO-8859-1]. TEXT = %x20-7E | %x80-FF | LWS ; any OCTET except CTLs, but including LWS A CRLF is allowed in the definition of TEXT only as part of a header field continuation. It is expected that the folding LWS will be replaced with a single SP before interpretation of the TEXT value. Characters outside of ISO8859-1 MAY be included where the encoded-word rule (as defined in RFC2047, Section 2) is specified. The encoded-word rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. When used in HTTP, encoded-word has no specified length limit. """ One question to consider here -- should %x80-%x9F be included in TEXT? They don't fall into the syntactic definition of CTLs in 2616, but the are semantically control characters, AFAIK. * p1, 2.2: Old: > comment = "(" *( ctext | quoted-pair | comment ) ")" New: """ comment = "(" *( ctext | quoted-pair | comment | encoded-word ) ")" """ * p1, 4.2: Old: > field-content = <field content> > ; the OCTETs making up the field-value > ; and consisting of either *TEXT or combinations > ; of token, separators, and quoted-string New: """ field-content = <field content> ; the OCTETs making up the field-value, ; consisting of either *TEXT or combinations ; of token, separators, quoted-string and encoded-word, ; according to the syntax specified by the field. """ * p3, B.1: Old: > filename-parm = "filename" "=" quoted-string New: """ filename-parm = "filename" "=" quoted-string | encoded-word """ * p6, 16.6: Old: > warn-text = quoted-string New: """ warn-text = quoted-string | encoded-word """ Note that I have NOT suggested the use of encoded-word in the following places: p1, 3.4 (Transfer Codings -- parameter values), p1, 6.1.1 (Reason- Phrase), p2, 10.2 (expect-extensions), p3, 3.3 (Media Types -- parameter values), p3, 6.1 (accept-extension), p4, 3 (ETag opaque- tag), p6, 16.2 (cache-extension), p6, 16.4 (extension-pragma). I think the *-extension and parameter value ones are straightforward; if a particular extension wants to specify use of encoded-word, it should; we shouldn't specify use of encoded-word in the generic extension construct, but leave it to the specific instances. I don't see a use case for ETags being internationalised -- does anyone else? Reason-Phrase may be necessary, though. Also, I haven't addressed From (p2, 10.3). Anybody want to take a stab at that? Cheers, -- Mark Nottingham http://www.mnot.net/
Received on Friday, 28 March 2008 01:19:55 UTC