Re: PROPOSAL: i74: Encoding for non-ASCII headers

Mark Nottingham wrote:
> I know people are getting fatigued by this one, and I'd like to at least 
> get a shared understanding of the scope of this issue, and maybe carve 
> off some parts to make it more manageable. Here are a few statements 
> that I believe capture where we're at; if you disagree, please say so 
> (hopefully without reopening the entire discussion);
> 
> 1. We are considering allowing UTF-8 in content, specifically (a) in 
> newly defined headers, and/or (b) in places where TEXT is now.

Yes.

> 2. We intend to remove the "blanket" RFC2047 encoding associated with 
> TEXT and (if kept) move it to the definitions of the individual rules, 
> so that it's clear where such encoding may occur. Candidates for this 
> include Reason-Phrase, filename-parm, warn-text, as well as the comments 
> in field-content.

Yes.

> 3. If RFC2047 encoding is used / referenced, we need to more carefully 
> specify its use; e.g., regarding what encoding forms are allowable, line 
> length limits, charsets used, folding.

Yes.

> 4. From also deserves a look.

Ok.

> 5. Either the definition of TEXT or CTL may need the C1 control 
> characters added.

TEXT already allows C1 controls (and always did) 
(<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-latest.html#rule.TEXT>):

   TEXT           = %x20-7E | %x80-FF | LWS
                  ; any OCTET except CTLs, but including LWS

That being said, I'd like it to exclude C1 controls.

> As Roy states, #1 should be approached conservatively. I think we can 
> quickly get a decision on 2, 3, and 5. If we later decide to allow 
> UTF-8, we can readjust the text (and TEXT) to suit, but if we're going 
> to punt on this, I'd like to at least nail down the parts of it that we 
> know we have to deal with.

Agreed.

BR, Julian

Received on Thursday, 3 April 2008 18:59:28 UTC