Re: FW: single percent from Michael A. Puls II on 2009-09-29 (public-iri@w3.org from September 2009)

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Tue, 29 Sep 2009 19:10:38 -0400
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Erik van der Poel" <erikv@google.com>
Cc: "Larry Masinter" <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <op.u01jzyjn1ejg13@sandra-svwliu01>

On Tue, 29 Sep 2009 06:50:58 -0400, Martin J. Dürst  
<duerst@it.aoyama.ac.jp> wrote:

> Hello Erik, others,
>
> On 2009/09/26 3:23, Erik van der Poel wrote:
>> On Fri, Sep 25, 2009 at 10:42 AM, Larry Masinter<masinter@adobe.com>   
>> wrote:
>>> Some mail didn't get sent to public-IRI which should have been:
>>>
>>> On 2009/09/03 7:33, Larry Masinter wrote:
>>>> Sorry to be rehashing what I think are old topics, but the discussion  
>>>> of these things seems to be scattered around on a zillion mailing  
>>>> lists:
>>>>
>>>>
>>>>    *   I'm not sure why   
>>>> http://example.com/%<http://example.com/%25>    should be illegal as  
>>>> an IRI. I remember some discussion of this, but not the resolution.  
>>>> Why not update IRI to allow it, since it seems to work in most  
>>>> systems?
>>
>> I think this got garbled along the way, but I assume you're talking
>> about a percent sign (%) in the path part that is not followed by two
>> hex digits. This does not "work in most systems". Our automated tests
>> show that IE8 will not send the HTTP request, Safari4 escapes % as
>> %25, while Firefox, Chrome and Opera leave the % as is.
>
> Oh, interesting. I think Larry and I were assuming that there was some  
> uniform behavior at least for major browsers that we could document  
> (instead of HTML5). If there's such variation, my first proposal would  
> be to go with the most conservative variant (single percents are simply  
> illegal -> don't send request,...). (My second proposal would be to  
> mention more lenitent processing only as a MAY.)

In a general note, whenever I have something that has %HH in it, I have a  
preprocessing step like:

escapeInvalidHH : function (s) {
     return s.replace(/%(?![0-9A-F]{2})/gi, function () {
         return "%25";
     });
}

, which takes each % not followed by 2 hexdigits and replaces it with %25  
so that the % is, in essence, treated literally. I do this so that if a  
strict %HH decoder (that doesn't just treat an invalid %HH literally) is  
used later on one of the parts of the string, that decoder will be happy  
and won't throw for example.

I do that because I believe that in "%ty%", for example, the generator of  
that string meant "%25ty%25" and could not have meant anything else.

So, imo, fwiw, there's only one way to handle that.

Now, for browsers specifically, I think it's intended that they should  
leave the invalid %HH alone and not convert the % to %25 unless the  
browser actually needs to decode part of the url with a strict decoder. In  
other words, if the browser is not consuming the url, it should just pass  
it along (expanding and resolving etc. aside) as-is.

Safari doing what it does makes sense as it's just doing some normalizing.  
However, I think other browsers not converting to %25 is intentional as  
kind of a "you only mess with it when and if you consume it" rule. But,  
that's just guessing.

Personally, I like Safari's %->%25 way, with the only problem being that  
it screws up testing of urls in the browser when you want to load a URL  
as-is to see how other things (like some server or client) handles it.

-- 
Michael

Received on Tuesday, 29 September 2009 23:11:24 UTC