Re: i74: Encoding for non-ASCII headers

Mark Nottingham wrote:
> 
> I think we've actually made progress on this; AFAICT, we seem to be 
> moving towards removing the generic text WRT RFC2047 encoding and 
> replacing it* with something that says that individual headers need to 
> nominate an encoding mechanism directly, and giving guidance on when 
> they should do so (roughly, wherever something is a candidate for 
> display and/or user input).

Right. Let's clarify which of the HTTP/1.1 headers allow RFC2047-style 
encoding; and let's also document a sane solution for new headers.

> Is that where we're at?
> 
> If so, the next step would be to craft recommendations / requirements 
> about what that mechanism will be. Possibilities discussed;
> 
> a) RFC2047

I haven't seen any evidence this being implemented.

> b) UTF-8

Unfortunately, RFC2616, Section 4.2 currently states:

     message-header = field-name ":" [ field-value ]
     field-name     = token
     field-value    = *( field-content | LWS )
     field-content  = <the OCTETs making up the field-value
                      and consisting of either *TEXT or combinations
                      of token, separators, and quoted-string>

Thus, if we take that as final word, we can't use anything but Latin1, 
thus need to encode non-Latin-1 characters.

> c) Something from BCP137 section 5

...which would be \u'nnnnnn' or &#xnnnnnn;...

> d) IRI->URI

> Separately, we'd need to open new issues for specifying these encodings 
> for the field-values of:
>   - From

...this one is currently defined in terms of RFC2822, Section 3.4...

>   - Warning

Currently explicitly refers to RFC2047.

>   - Content-Location
 >   - Location
 >   - Referer

These are URI references. No non-ASCII characters anyway.

>   - Content-Dispostion (?)

Content-Disposition uses I18N *inside* the parameters, for which there 
already is RFC2231.

> Am I overlooking anything?

Reason-Phrase, for instance. In general, we need to answer whether 
RFC2047 applies to everything using "comment" or "quoted-string".

> * It isn't actually replacing it, it's moving it to something specific 
> to the field-value of headers. I don't hear anyone talking about 
> internationalising other protocol elements at this point...


BR, Julian

Received on Monday, 17 March 2008 12:49:13 UTC