- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Tue, 28 Apr 2009 12:02:18 -0700
- To: Ian Hickson <ian@hixie.ch>
- CC: HTML WG <public-html@w3.org>
Ian Hickson wrote: > Just because the URL is invalid doesn't mean it has to be canonicalised. > There are plenty of other URLs that are syntactically invalid that Gecko > doesn't fix up, for example: > > http://example.com/% That might well not be intentional... > The RFCs don't say how to do error handling, so they're somewhat > irrelevant here. They're relevant if the browser tries to send valid URIs on the wire, in general (less of an issue for fragment IDs, since those don't go on the wire from the browser). > Anyway. Is the algorithm at: > > http://www.whatwg.org/specs/web-apps/current-work/#the-indicated-part-of-the-document > > Satisfactory? Could you point me to the part of the spec that defines what a UA is to do with <a href>, exactly? It's hard to evaluate this algorithm without a reference for how that's handled on hand. That said, there's one case I can think of offhand where the proposed algorithm has undesirable behavior. Any time the browser is given a URI (not IRI) with a fragment (e.g. a Location HTTP header with a fragment), the only way to make that fragment match an id is to have the ID URI-escaped, and in particular have all non-ASCII characters URI-escaped. Then that same ID is a pain to match from IRIs (they also end up needing to have those characters escaped). It's an obvious consequence of treating an IRI and its corresponding URI differently, and maybe one we can live with here... I don't do enough computing in languages other than English to say. -Boris
Received on Tuesday, 28 April 2009 19:04:01 UTC