- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Sat, 16 Aug 2008 04:43:08 +0200
- To: ietf-http-wg@w3.org
Brian Smith wrote: > RFC 2231 + UTF-8 is an especially bad interchange format > for text since it requires over 9 bytes per letter The length is no obstacle for HTTP, and as you write later, we are anyway talking about relatively short strings. For some languages legacy charsets will be "better" than UTF-8 wrt "compression", but I think interoperability is more relevant for our purposes. There is no way to jump from "raw Latin-1" to "raw UTF-8" in HTTP/1.1 headers, any mixtures would be a horrible mess. If the WG meeting had a coherent transition strategy, e.g., "stick to Latin-1 in HTTP/1.1, do UTF-8 later in HTTP/1.2", or "deprecate Latin-1 in HTTP/1.1 now, introduce UTF-8 in HTTP/1.1 later", I'd like to see precise minutes about it. JFTR, again, I think we need a clear transition strategy. But so far we don't have it. > there are no features for language tagging Of course there are. Raw UTF-8 doesn't offer this, unless you try the NOT RECOMMENDED obscure u+E00?? language tags. > BIDI (needed for middle-eastern languages) All charsets needing this, not limited to UTF-8, offer it. The gibbous RFC 2231 percent-encoding doesn't change this. > it is only suitable for short, language-neutral strings > like (file and IRI) path fragments. Do you propose to remove the optional [language] element in the draft ? It's a possibility, but some lines above you said language tagging is essential. > The draft references Unicode 4.0 indirectly through > RFC3629. Strong NAK. STD 63 is not, repeat NOT, bound to some specific Unicode version. In a parallel universe where the Unicode Consortium tried to redefine UTF-8 they'd be disappointed when STD 63 sticks to the definition as it was Unicode 4. But that is bad science fiction. And it doesn't affect the set of assigned code points, UTF-8 can do anything up to u+10FFF as specified in STD 63. Other non-IETF UTF-8 specifications are less relevant for our purposes. (As you see I believe in bad science fiction, after ISO 29500. Therefore it's IMO perfect to reference STD 63). > I don't see the point of requiring ISO-8859-1. See above, so far all proposals to ditch Latin-1 didn't make it. As long as that doesn't change Latin-1 is the only permitted form of any non-ASCII octets in HTTP/1.1 headers. Frank
Received on Saturday, 16 August 2008 02:42:09 UTC