Re: Error handling in URIs

On Wed, 25 Jun 2008, Frank Ellermann wrote:
> Ian Hickson wrote:
> > 
> > Is there any chance that the URI and IRI specifications might get 
> > updated to handle these issues?
> 
> RFC 3986 is a full Internet standard, software doing something else is 
> broken. Of course old software has the excuse of being old.  An update 
> of RFC 3987 (IRIs) makes no sense until IDNAbis is ready.  YMMV, IDNAbis 
> won't affect query parts.  Note that RFC 3987 is by definition an 
> *immature* standard like any other proposed standard, expect changes 
> when it is later promoted on standards track.  The same goes for IDNA, 
> but not STDs 63 + 66.

So to summarise, the URI and IRI specs are relatively stable and not 
likely to change before HTML5 reaches CR in 2012 or so, right?


> > It would be much cleaner if instead HTML5 could just defer to the URI 
> > specs for everything URI-related.
> 
> That is not only "much clearer", it is a minimal requirement to talk 
> about it (unless you want HTML5 published by ECMA).  There are no 
> non-ASCII characters in an STD 66 URI.  And there are no non-ASCII 
> characters in the URI-representation of an RFC 3987 IRI.  Just reference 
> RFC 3986 + 3987 "as is" and be done with it, it's no rocket science, and 
> any attempt to "redefine" these standards can only cause confusion.

That's what HTML5 did until about a week ago. The problem is that doing so 
leaves a number of behaviours undefined, as far as I can tell. For 
example, what should following the link in this example do, in terms of 
the actual URI passed to the networking layer?

   <!DOCTYPE HTML>
   <title>Test</title>
   <meta charset="ISO-8859-13">
   <p><a href="results.cgi/&#x017d;?&#x017d;">Test</a>

Similarly, what should the script in the following example display in the 
alert dialog box, assuming a base URL of http://example.com/ ?

   <!DOCTYPE HTML>
   <title>Test</title>
   <p><a href="{{%%xx##">Test</a>
   <script>alert(document.links[0].href)</script>

Where is this defined?


On Wed, 25 Jun 2008, Frank Ellermann wrote:
> 
> One of the best things with IRIs is that they are KISS:
> 
> They use one and only one charset, the document charset, wherever they 
> contain non-ASCII characters.
> 
> For document types permitting NCRs or similar entities it means whatever 
> it means in this document type, i.e. typically Unicode points or *error* 
> (e.g., using &uuml; in XML without definition).

That's what HTML5 used to say (by implication, since it just referred to 
the IRI specification). Unfortunately, this requirement was ignored by Web 
browser vendors since vast quantities of existing content rely on encoding 
behaviour that is somewhat more complex than that. Maybe I have an ego 
problem, but I want the specs I work on to actually be widely used. :-) 
Thus, I am forced to specify rules that are compatible with the actual 
existing content on the Web, even when those rules are less than ideal.


On Wed, 25 Jun 2008, Frank Ellermann wrote:
>
>
> Taking false assumptions into account always results in "do what you 
> like", as they let us prove in elementary courses.  The IRI spec. is RFC 
> 3987, not what some versions of Firefox did or do.

Sadly what matters in practice isn't the spec, if the vendors ignore it.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 25 June 2008 00:21:29 UTC