UTF-8 (was: PROPOSAL: i74: Encoding for non-ASCII headers)

Henrik Nordstrom wrote:

> My gut feeling is that the best long term move would be to move to
> UTF-8 and forget about 2047 and accept that some existing things
> MAY break, BUT as you say it can not be proved to be completely
> without problems for existing implementations. In fact it's very
> likely to cause problems in some areas:

> - Authentication (RFC2617, here ISO-8859-1 is actively supported
> today, but not really sufficient)

ACK, in theory that could be fixed by adopting RFC 2831 or 2831bis
magic.  I'm not exactly sure about Unicode 3.2 SASLPREP in 2831bis,
maybe RFC 5198 NFC minus anything declared to be bad in 3987bis is
good enough for a 2617bis.

> - Cookie, for applications using/setting cookies in both client
> and server contexts (not just echoing what you got)

I cannot judge cookies.  I'm happy when I find the way to disable
double-analytics-tracker cookies from 3rd parties, for undisclosed
reasons FF2 makes that more difficult than IE6, but it is possible.

> For now I think the only possible outcome is to keep what we have;
> ISO-8859-1 as default, but clarifying that intermediaries should
> handle them as 8-bit ASCII strings and consider the
C1
> set (0x80-0x9F) as just another set of octets of the string (not
> as control characters or invalid) and a note that future headers
> MAY be seen using UTF-8 encoding.

Point - I did not consider that when I proposed to exclude C1 from
Mark's 2B proposal.  Oddly we all agree that we want UTF-8 "later",
but have different ideas how to get there.

 Frank

Received on Monday, 31 March 2008 18:11:03 UTC