W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2008

Re: PROPOSAL: i74: Encoding for non-ASCII headers

From: Mark Nottingham <mnot@mnot.net>
Date: Fri, 28 Mar 2008 10:45:05 +1100
Cc: "Martin Duerst" <duerst@it.aoyama.ac.jp>, "Jamie Lokier" <jamie@shareable.org>, "Roy T. Fielding" <fielding@gbiv.com>, "HTTP Working Group" <ietf-http-wg@w3.org>
Message-Id: <D529CE06-367E-4BD8-A905-F400FCABBD28@mnot.net>
To: Robert Brewer <fumanchu@aminus.org>


On 28/03/2008, at 4:28 AM, Robert Brewer wrote:
>>
>> I could understand such a requirement if we had a blanket requirement
>> that RFC2047 encoding could occur anywhere, so that implementations
>> could blindly decode/encode headers as necessary, whether they
>> recognised them or not. However, we're not going in that direction,
>> because it's not reasonable to implement...
>
> I don't understand. From where I sit that sounds like not only a  
> snap to
> write from scratch, but has the potential to simplify a lot of
> codebases.

And slow them down; most implementations (client, server and  
intermediary) are performance-sensitive, and the number of headers  
where i18n content is useful is very small. If we specified blanket  
encoding, it wouldn't get implemented.

>> ...and in any case the encoding
>> is already tied to the semantics of the headers somewhat, since you
>> have to recognise the header to understand its structure enough to
>> know where TEXT may appear (i.e., it's not a complete blanket, just  
>> an
>> uneven one over TEXT).
>>
>> That being the case, I can't help but see the RFC2047 requirement as
>> spurious, and the most straightforward thing to do would seem to be  
>> to
>> ditch the spurious requirement and move on -- without disallowing
>> RFC2047 encoding from being specified in a particular header if that
>> makes sense, but not disallowing other encodings either.
>
> Hrm. I'm not sure what "other encodings" includes.


Concretely, our options at this point are:

1) Change the character encoding on the wire to UTF-8
2) Leave the character encoding on the wire at ISO-8859-1, document  
existing TEXT instances' encoding requirements on top of that, and
    a) Require new headers that need i18n content to specify RFC2047, or
    b) Require new headers that need i18n content to specify *some*  
encoding into ISO-8859-1 using character escapes (which explicitly MAY  
be RFC2047).

In either case,  i18n content isn't allowed in *any* header -- only  
those places where the header specifically allows it.

<chair_hat>
At this point, my reading is that we're leaning towards (2a); some  
people have spoken in favour of (1), but others have expressed  
concerns about backwards-compatibility, etc., but there also seems to  
be good support for, and no strong objection, to (2a) -- except for  
me, in favour of (2b), and as I said before, that's not a strong  
preference.

At this point, I think we need a more exact proposal to move forward...
</chair_hat>

... which I'll work on. If someone wants to work on text for #1 for  
comparison, feel free.

--
Mark Nottingham     http://www.mnot.net/
Received on Thursday, 27 March 2008 23:45:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:50:37 GMT