- From: Michael A. Puls II <shadow2531@gmail.com>
- Date: Tue, 29 Sep 2009 19:10:38 -0400
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Erik van der Poel" <erikv@google.com>
- Cc: "Larry Masinter" <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
On Tue, 29 Sep 2009 06:50:58 -0400, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote: > Hello Erik, others, > > On 2009/09/26 3:23, Erik van der Poel wrote: >> On Fri, Sep 25, 2009 at 10:42 AM, Larry Masinter<masinter@adobe.com> >> wrote: >>> Some mail didn't get sent to public-IRI which should have been: >>> >>> On 2009/09/03 7:33, Larry Masinter wrote: >>>> Sorry to be rehashing what I think are old topics, but the discussion >>>> of these things seems to be scattered around on a zillion mailing >>>> lists: >>>> >>>> >>>> * I'm not sure why >>>> http://example.com/%<http://example.com/%25> should be illegal as >>>> an IRI. I remember some discussion of this, but not the resolution. >>>> Why not update IRI to allow it, since it seems to work in most >>>> systems? >> >> I think this got garbled along the way, but I assume you're talking >> about a percent sign (%) in the path part that is not followed by two >> hex digits. This does not "work in most systems". Our automated tests >> show that IE8 will not send the HTTP request, Safari4 escapes % as >> %25, while Firefox, Chrome and Opera leave the % as is. > > Oh, interesting. I think Larry and I were assuming that there was some > uniform behavior at least for major browsers that we could document > (instead of HTML5). If there's such variation, my first proposal would > be to go with the most conservative variant (single percents are simply > illegal -> don't send request,...). (My second proposal would be to > mention more lenitent processing only as a MAY.) In a general note, whenever I have something that has %HH in it, I have a preprocessing step like: escapeInvalidHH : function (s) { return s.replace(/%(?![0-9A-F]{2})/gi, function () { return "%25"; }); } , which takes each % not followed by 2 hexdigits and replaces it with %25 so that the % is, in essence, treated literally. I do this so that if a strict %HH decoder (that doesn't just treat an invalid %HH literally) is used later on one of the parts of the string, that decoder will be happy and won't throw for example. I do that because I believe that in "%ty%", for example, the generator of that string meant "%25ty%25" and could not have meant anything else. So, imo, fwiw, there's only one way to handle that. Now, for browsers specifically, I think it's intended that they should leave the invalid %HH alone and not convert the % to %25 unless the browser actually needs to decode part of the url with a strict decoder. In other words, if the browser is not consuming the url, it should just pass it along (expanding and resolving etc. aside) as-is. Safari doing what it does makes sense as it's just doing some normalizing. However, I think other browsers not converting to %25 is intentional as kind of a "you only mess with it when and if you consume it" rule. But, that's just guessing. Personally, I like Safari's %->%25 way, with the only problem being that it screws up testing of urls in the browser when you want to load a URL as-is to see how other things (like some server or client) handles it. -- Michael
Received on Tuesday, 29 September 2009 23:11:24 UTC