- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 25 Jun 2008 00:07:25 +0000 (UTC)
- To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Cc: uri@w3.org
On Wed, 25 Jun 2008, Frank Ellermann wrote: > Ian Hickson wrote: > > > > Is there any chance that the URI and IRI specifications might get > > updated to handle these issues? > > RFC 3986 is a full Internet standard, software doing something else is > broken. Of course old software has the excuse of being old. An update > of RFC 3987 (IRIs) makes no sense until IDNAbis is ready. YMMV, IDNAbis > won't affect query parts. Note that RFC 3987 is by definition an > *immature* standard like any other proposed standard, expect changes > when it is later promoted on standards track. The same goes for IDNA, > but not STDs 63 + 66. So to summarise, the URI and IRI specs are relatively stable and not likely to change before HTML5 reaches CR in 2012 or so, right? > > It would be much cleaner if instead HTML5 could just defer to the URI > > specs for everything URI-related. > > That is not only "much clearer", it is a minimal requirement to talk > about it (unless you want HTML5 published by ECMA). There are no > non-ASCII characters in an STD 66 URI. And there are no non-ASCII > characters in the URI-representation of an RFC 3987 IRI. Just reference > RFC 3986 + 3987 "as is" and be done with it, it's no rocket science, and > any attempt to "redefine" these standards can only cause confusion. That's what HTML5 did until about a week ago. The problem is that doing so leaves a number of behaviours undefined, as far as I can tell. For example, what should following the link in this example do, in terms of the actual URI passed to the networking layer? <!DOCTYPE HTML> <title>Test</title> <meta charset="ISO-8859-13"> <p><a href="results.cgi/Ž?Ž">Test</a> Similarly, what should the script in the following example display in the alert dialog box, assuming a base URL of http://example.com/ ? <!DOCTYPE HTML> <title>Test</title> <p><a href="{{%%xx##">Test</a> <script>alert(document.links[0].href)</script> Where is this defined? On Wed, 25 Jun 2008, Frank Ellermann wrote: > > One of the best things with IRIs is that they are KISS: > > They use one and only one charset, the document charset, wherever they > contain non-ASCII characters. > > For document types permitting NCRs or similar entities it means whatever > it means in this document type, i.e. typically Unicode points or *error* > (e.g., using ü in XML without definition). That's what HTML5 used to say (by implication, since it just referred to the IRI specification). Unfortunately, this requirement was ignored by Web browser vendors since vast quantities of existing content rely on encoding behaviour that is somewhat more complex than that. Maybe I have an ego problem, but I want the specs I work on to actually be widely used. :-) Thus, I am forced to specify rules that are compatible with the actual existing content on the Web, even when those rules are less than ideal. On Wed, 25 Jun 2008, Frank Ellermann wrote: > > > Taking false assumptions into account always results in "do what you > like", as they let us prove in elementary courses. The IRI spec. is RFC > 3987, not what some versions of Firefox did or do. Sadly what matters in practice isn't the spec, if the vendors ignore it. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 25 June 2008 00:21:29 UTC