Date: Tue, 15 Apr 1997 13:07:23 -0700 (PDT) From: Chris Newman <Chris.Newman@innosoft.com> Subject: Re: revised "generic syntax" internet draft In-Reply-To: <email@example.com> To: IETF URI list <firstname.lastname@example.org> Message-Id: <Pine.SOL.3.95.970415124833.22015Jemail@example.com> Here's the approaches to i18n I've seen: (1) US-ASCII only (2) ISO-8859-1 only (3) whatever localized character set is in use (4) Explicit labelling of character set (5) Unicode derivative. ---- (1) Never works because it doesn't satisfy demand. (2) Never works and is even worse than (1) because not only does it fail to satisfy demand, but it uses up the "undefined" codepoints in such a way that an interoperable solution *can't* be deployed. (3) Never works, because it doesn't interoperate. It results in a bunch of islands which can't communicate, except via US-ASCII. (4) Works fine, but is very hard to support for ideographic characters. Dealing with mapping tables between ISO-2022, Unicode and whatever character set is supported by the display system is very hard. (5) Works fine, and has potential to be easier to support than (4). ---- The status quo in URLs is a mixture of (1), (2), and (3). This is completely unacceptable for an interoperable solution. We *MUST* move towards (4) or (5). Given that I've heard no proposals along the lines of MIME header encoded words, the only solution on the table is (5). I will also point out than when a URL contains unencoded 8-bit characters and is embedded in a properly charset-labelled document, there are no problems as the interpretation is clear. We do need to deal with the interpretation of %-encoded 8-bit characters. If we're ambitious, we can also address the issue of unlabelled unencoded 8-bit characters, but I'd be tempted to avoid that rathole. The biggest failure of HTTP/HTML was choosing (2) above when MIME already had a perfectly functional solution (4).