Re: PROPOSAL: i74: Encoding for non-ASCII headers

On 27/03/2008, at 9:17 PM, Martin Duerst wrote:

> At 14:41 08/03/27, Mark Nottingham wrote:
>>
>> My reading is that HTTP is limited to iso-8859-1 *on the wire*, and
>> requires RFC2047 encoding for characters outside of that range. Do  
>> you
>> disagree with that?
>
> That's what's written in RFC 2616. The question is whether and to
> what extent that's (still) sensible in practice.
>
>> My intent was not to disallow RFC2047, but rather to allow other
>> encodings into iso-8859-1 where appropriate.
>
> What do you mean by "other encodings into iso-8859-1"?
> Please explain.

I thought I had, but obviously not well.

Roy said (to paraphrase) that IRIs do not show up in HTTP -- that  
they're just URIs. I agree with that, but only as far as you can view  
IRIs as an encoding into ASCII (albeit an imperfect one, because you  
can't round-trip them, since there's a bit of ambiguity).

RFC2047 is also an encoding into ASCII; it is not a character encoding  
in its own right. In that sense, it's a peer of BCP137 and other  
schemes that do similar things. They all end up taking characters from  
a set greater than that available to iso-8859-1 and encoding them into  
a subset of it (usually ASCII) using escape sequences.

That being the case, my question is this: is it realistic to require  
all headers to use RFC2047 encoding, to the exclusion of BCP137, etc?

I could understand such a requirement if we had a blanket requirement  
that RFC2047 encoding could occur anywhere, so that implementations  
could blindly decode/encode headers as necessary, whether they  
recognised them or not. However, we're not going in that direction,  
because it's not reasonable to implement, and in any case the encoding  
is already tied to the semantics of the headers somewhat, since you  
have to recognise the header to understand its structure enough to  
know where TEXT may appear (i.e., it's not a complete blanket, just an  
uneven one over TEXT).

That being the case, I can't help but see the RFC2047 requirement as  
spurious, and the most straightforward thing to do would seem to be to  
ditch the spurious requirement and move on -- without disallowing  
RFC2047 encoding from being specified in a particular header if that  
makes sense, but not disallowing other encodings either.

Once again, I have no desire to lie down in the road on this part of  
the issue -- I'm happy to give this up and move on, or to be told how  
wrong I am (as i18n isn't my area by any means). I'm just a bit  
surprised at how hard it is to communicate this view (which leads me  
to believe that indeed I've got something wrong here).

Cheers,

--
Mark Nottingham     http://www.mnot.net/

Received on Thursday, 27 March 2008 11:01:04 UTC