Re: PROPOSAL: i74: Encoding for non-ASCII headers from Mark Nottingham on 2008-03-26 (ietf-http-wg@w3.org from January to March 2008)

From: Mark Nottingham <mnot@mnot.net>
Date: Wed, 26 Mar 2008 12:01:52 +1100
To: Roy T. Fielding <fielding@gbiv.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <09493944-1775-4B77-BF73-91B8506D792B@mnot.net>
On 26/03/2008, at 11:40 AM, Roy T. Fielding wrote:

>> A secondary issue is what encoding should be used in those cases  
>> were it is reasonable to allow it. I'm not sure what the value of  
>> requiring that it be the same everywhere is; some payloads (e.g.,  
>> IRIs, e-mail addresses) have well-defined "natural" encodings into  
>> ASCII that are more appropriate.
>
> Unless we are going to change the protocol, the answer to that  
> question
> is ISO-8859-1 or RFC2047.  If we are going to change the protocol,  
> then
> the answer would be raw UTF-8 (HTTP doesn't care about the content of
> TEXT as long as the encoding is a superset of ASCII, so the only
> compatibility issue here is understanding the intent of the sender).

What do you mean by ISO-8859-1 *or* RFC2047 here?

Even if RFC2047 encoding is in effect, the actual character set in use  
is a subset of ISO-8859-1; no characters outside of that are actually  
on the wire, it's just an encoding of them into ASCII.

This is why I question whether it's realistic to require RFC2047,  
given that some applications -- e.g., headers that might want to carry  
a IRI -- are already using an encoding that's not RFC2047.

Of course, you can say that they're not carrying non-ASCII characters,  
because it's just a URI, but I'd say that's just a way of squinting at  
the problem, and RFC2047 is yet another way of squinting; it looks  
like it's just ASCII as well.


>> Mind you, personally I'm not religious about this; I just think  
>> that if we mandate RFC2047 encoding be used in new headers that  
>> need an encoding, we're going to be ignored, for potentially good  
>> reasons.
>
> What good reasons?  In this case, we are not mandating anything.
> We are simply passing through the one and only defined i18n solution
> for HTTP/1.1 because it was the only solution available in 1994.
> If email clients can (and do) implement it, then so can WWW clients.

See above. Specifically, what impact does the requirement to use  
RFC2047 have on other encodings -- is it saying that serialising an  
IRI as a URI in a HTTP header is non-conformant? That if another  
problem domain, for whatever reason, decides to mint a header that  
uses BCP137 instead of RFC2047, that it also violates HTTP? This seems  
a stretch to me... I'd put forth that the requirement is spurious.

> People who want to fix that should start queueing for HTTP/1.2.

Please explain how removing the requirement that only RFC2047 be used  
to encode non-ISO-8859-1 characters in new headers requires a version  
bump.

>
>> 2) Constrain TEXT to contain only characters from iso-8859-1.
>
> No, that breaks compliant senders.

How? Are you saying that senders are already sending text that  
contains non-8859-1 characters (post-encoding)?


>> 3) Add advice that, for a particular context of use, other  
>> characters MAY be encoded (whether that's strictly RFC2047, or more  
>> fine-grained advice TBD) by specifying it in that context.
>> 4) Add new issues for dealing with specific circumstances (e.g.,  
>> From, Content-Disposition, Warning) as necessary. If the outcome of  
>> #3 is to require RFC2047, this is relatively straightforward.
>
> There is no great need that has been established to support any
> changes to the allowed TEXT encoding other than to separate the
> rules that don't actually allow that encoding.  IMO, changes to
> HTTP/1.1 must be motivated by actual implementations.


Could be. Again, my main concern here is to take the blanket  
requirement away and make it more focused.


--
Mark Nottingham     http://www.mnot.net/
Received on Wednesday, 26 March 2008 01:02:34 UTC