Re: fragid navigation and pct-encoded from Boris Zbarsky on 2009-04-28 (public-html@w3.org from April 2009)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 28 Apr 2009 12:02:18 -0700
To: Ian Hickson <ian@hixie.ch>
CC: HTML WG <public-html@w3.org>
Message-ID: <49F752BA.20505@mit.edu>

Ian Hickson wrote:
> Just because the URL is invalid doesn't mean it has to be canonicalised. 
> There are plenty of other URLs that are syntactically invalid that Gecko 
> doesn't fix up, for example:
> 
>    http://example.com/%

That might well not be intentional...

> The RFCs don't say how to do error handling, so they're somewhat 
> irrelevant here.

They're relevant if the browser tries to send valid URIs on the wire, in 
general (less of an issue for fragment IDs, since those don't go on the 
wire from the browser).

> Anyway. Is the algorithm at:
> 
>    http://www.whatwg.org/specs/web-apps/current-work/#the-indicated-part-of-the-document
> 
> Satisfactory?

Could you point me to the part of the spec that defines what a UA is to 
do with <a href>, exactly?  It's hard to evaluate this algorithm without 
a reference for how that's handled on hand.

That said, there's one case I can think of offhand where the proposed 
algorithm has undesirable behavior.  Any time the browser is given a URI 
(not IRI) with a fragment (e.g. a Location HTTP header with a fragment), 
the only way to make that fragment match an id is to have the ID 
URI-escaped, and in particular have all non-ASCII characters 
URI-escaped.  Then that same ID is a pain to match from IRIs (they also 
end up needing to have those characters escaped).  It's an obvious 
consequence of treating an IRI and its corresponding URI differently, 
and maybe one we can live with here...  I don't do enough computing in 
languages other than English to say.

-Boris

Received on Tuesday, 28 April 2009 19:04:01 UTC