Re: IRIs, IDNAbis, and HTTP [i74]

Martin Duerst wrote:
 
> We have had that discussion before, in another location.

On the validator list, and a few days ago somebody posted
a similar question about width and height:

<http://thread.gmane.org/gmane.org.w3c.validator/10560>
 
> the HTML DTD allows prety much anything in attributes

That does not mean that only the DTD minus comments defines
what HTML is, or that URIs are more or less "anything goes".

In another article you wrote:

| we could just claim that we are doing an encoding on top
| of iso-8859-1. Eventually, these few implementations will
| catch up.

If you seriously want this you can use "UTF-4", it encodes:
u+0000..u+007F as 0x00..0x7F (= US-ASCII)
u+00A0..u+00FF as 0xA0..0xFF (= visible iso-8859-1)

u+0080..u+009F as 0x829890 .. 0x82999F (encoded C1)
u+0100 ff.     as 0x83919090 ff. (excl. surrogates)

0x8N is the lead byte for N trail bytes 0x90..0x9F, each
enconding one hex. digit of the (UTF-32BE) codepoint, the
first trail byte cannot be 0x90 (overlong encoding).

In theory, because "UTF-4" doesn't use the minimal C1 set
SS2 and SS3 (0x8E and 0x8F), you can even retrofit it in
an ISO 4873 framework (like ISO-8859).

I could post one of these proposals (XHTM1-i18n or UTF-4)
as I-D if it helps - don't laugh, I saw the "UTF-5" I-D ;-)

 Frank
-- 
<URL:http://purl.net/xyzzy/home/test/utf-4.xml>

Received on Monday, 17 March 2008 06:32:19 UTC