Re: FW: single percent

Hello Erik, others,

On 2009/09/26 3:23, Erik van der Poel wrote:
> On Fri, Sep 25, 2009 at 10:42 AM, Larry Masinter<masinter@adobe.com>  wrote:
>> Some mail didn't get sent to public-IRI which should have been:
>>
>> On 2009/09/03 7:33, Larry Masinter wrote:
>>> Sorry to be rehashing what I think are old topics, but the discussion of these things seems to be scattered around on a zillion mailing lists:
>>>
>>>
>>>    *   I'm not sure why  http://example.com/%<http://example.com/%25>    should be illegal as an IRI. I remember some discussion of this, but not the resolution. Why not update IRI to allow it, since it seems to work in most systems?
>
> I think this got garbled along the way, but I assume you're talking
> about a percent sign (%) in the path part that is not followed by two
> hex digits. This does not "work in most systems". Our automated tests
> show that IE8 will not send the HTTP request, Safari4 escapes % as
> %25, while Firefox, Chrome and Opera leave the % as is.

Oh, interesting. I think Larry and I were assuming that there was some 
uniform behavior at least for major browsers that we could document 
(instead of HTML5). If there's such variation, my first proposal would 
be to go with the most conservative variant (single percents are simply 
illegal -> don't send request,...). (My second proposal would be to 
mention more lenitent processing only as a MAY.)

>> Martin:
>>
>> It's illegal in URIs, too. The URI and IRI syntaxes should be as
>> parallel as possible. In terms of implementations, it may be easy for
>> consumers, but for producers, it's not. It's much easier to just escape
>> than to go and check whether (one or) two hex digits are following
>> (which would change the meaning totally).
>
> Surely that depends on the type of producer. For HTML form
> submissions, % should be escaped as %25,

Yes, if you have a '%' which is simply data, you should convert it to '%25'.

> but for HTML hrefs, the
> producer is also a consumer and should first check whether two hex
> digits follow.

I'm not sure what you mean here by "the producer is also a consumer". 
Can you explain?

> The big question is what to do about a % sign that is
> not followed by two hex digits. The major browsers currently handle
> this differently, so producers would be wise to avoid this,

Very much so indeed. Even if major browsers handled this all the same 
way, there's much more than just major browsers that processes URIs or IRIs.

> but it is
> not clear to me what advice should be given to consumer/producer
> implementers. Is it better to be conservative like IE and reject it?
> Or is it better to be forgiving like Firefox and just send out the
> lone % sign? (Note: this particular case is interesting, because IE is
> usually the forgiving one, while Firefox is the conservative one.)

Well, there's always the hope for progress.

>> Martin:
>>
>> I think the purpose is to %-encode '[' and ']' except for the authority
>> part, where they are needed for IPV6. The encoding is done because '['
>> and ']' are not allowed elsewhere than in IP-literal.
>
> I don't see why [ and ] should be disallowed in the path and query
> parts,

Well, currently the specs say so (the URI spec says so, and the IRI spec 
follows it).

> but the major browsers currently handle those characters
> differently in the path/query. (Some browsers %-encode, others don't.)

Can you give details?

Overall, I'm more and more wondering how we as editors, or a potential 
IETF IRI WG, would deal with the kind of variability between browsers 
that Erik is bringing up here. I thought we could just work from what 
HTML5 had, because that reflected wide current practice among browsers, 
but that doesn't seem to really be true.

Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Tuesday, 29 September 2009 10:52:11 UTC