- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Sat, 28 Jun 2008 15:17:21 +0200
- To: uri@w3.org
Elliotte Harold wrote: > 1. All numeric character references should be considered to point to > Unicode code points. Done since about RFC 2070. > 2. All percent escapes in documents should be considered to refer > to UTF-8 bytes. Not true, http://example.org/%C0%80 is a perfectly valid URI, and certainly not UTF-8. > 3. The browser should convert all IRIs to pure URIs using > exclusively UTF-8 percent encoding as specified in the IRI spec. Yes, since about RFC 3987. The IRI itself can of course use the encoding of its context, e.g., KOI8-R in a KOI8-R document: http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-IRI-koi8-r.html > 4. If this fails because the UTF-8 in step 2 is ill-formed, redo > step 2 assuming the encoding is ISO-8859-1 and pray. Nobody uses iso-8859-1 for real, ITYM windows-1252. For values of "nobody" excluding RFC 2616 and 2617 among others, but when you're going to guess try windows-1252. > I'm not sure about step 4. Maybe there's better error handling > to be done, but steps 1-3 are the only sane approaches to this. Yes, skip the prayer, URIs have no "default charset", that's a historical accident limited to HTTP. > Any scheme that attempts to replicate existing browser URL-encoding > behavior is doomed to failure, and will simply relegate us to ASCII > only URIs for the foreseeable future. URIs are "ASCII only" in the same sense as host names are ASCII only, i.e. the proper subset as specified in STD 66, with a way to use any octet in its percent-encoded form. And IRIs are not limited to UTF-8. Everything is perfect (ignoring HTTP again). "Redefining" URIs is a horror scenario. Maybe folks interested in selling new hard- and software like such disasters :-( Of course an IRI without context needs to be UTF-8, guess and pray is no recipe. And it has to be transformed to an URI as specified in RFC 3987 for practical purposes - preferably by the side knowing how that works, e.g., the other side might not know the "punycode" fine print for an <ihost>. Frank
Received on Saturday, 28 June 2008 13:19:27 UTC