Character encodings in headers [i74][was: Straw-man charter for http-bis]

On 10/06/2007, at 6:05 PM, Martin Duerst wrote:
> - RFC 2616 prescribes that headers containing non-ASCII have to use
>   either iso-8859-1 or RFC 2047. This is unnecessarily complex and
>   not necessarily followed. At the least, new extensions should be
>   allowed to specify that UTF-8 is used.

My .02;

I'm concerned about allowing UTF-8; it may break existing  
implementations.

I'd like to see the text just require that the actual character set  
be 8859-1, but to allow individual extensions to nominate encodings  
*like* 2047,without being restricted to it. For example, the encoding  
specified in 3987 is appropriate for URIs. However, it *has* to be  
explicit; I've heard some people read this requirement and think that  
they need to check *every* header for 2047 encoding.

So, I think this means;

1) Change
   "Words of *TEXT MAY contain characters from character sets other  
than ISO-8859-1 [22] only when encoded according to the rules of RFC  
2047 [14]."
to
   "Words of *TEXT MUST NOT contain characters from character sets  
other than ISO-885901 [22]."
and,

2) Identify headers that may have non-8859 content and explicitly say  
how to encode them (IRI, 2047, whatever; the existing ones will have  
to be 2047, I believe), modifying their BNF to suit.

3) When we document extensibility, require new headers to nominate  
any encoding explicitly.

--
Mark Nottingham     http://www.mnot.net/

Received on Monday, 20 August 2007 03:40:59 UTC