Re: FW: single percent

On Fri, Sep 25, 2009 at 10:42 AM, Larry Masinter <masinter@adobe.com> wrote:
> Some mail didn't get sent to public-IRI which should have been:
>
> On 2009/09/03 7:33, Larry Masinter wrote:
>> Sorry to be rehashing what I think are old topics, but the discussion of these things seems to be scattered around on a zillion mailing lists:
>>
>>
>>   *   I'm not sure why  http://example.com/%<http://example.com/%25>  should be illegal as an IRI. I remember some discussion of this, but not the resolution. Why not update IRI to allow it, since it seems to work in most systems?

I think this got garbled along the way, but I assume you're talking
about a percent sign (%) in the path part that is not followed by two
hex digits. This does not "work in most systems". Our automated tests
show that IE8 will not send the HTTP request, Safari4 escapes % as
%25, while Firefox, Chrome and Opera leave the % as is.

> Martin:
>
> It's illegal in URIs, too. The URI and IRI syntaxes should be as
> parallel as possible. In terms of implementations, it may be easy for
> consumers, but for producers, it's not. It's much easier to just escape
> than to go and check whether (one or) two hex digits are following
> (which would change the meaning totally).

Surely that depends on the type of producer. For HTML form
submissions, % should be escaped as %25, but for HTML hrefs, the
producer is also a consumer and should first check whether two hex
digits follow. The big question is what to do about a % sign that is
not followed by two hex digits. The major browsers currently handle
this differently, so producers would be wise to avoid this, but it is
not clear to me what advice should be given to consumer/producer
implementers. Is it better to be conservative like IE and reject it?
Or is it better to be forgiving like Firefox and just send out the
lone % sign? (Note: this particular case is interesting, because IE is
usually the forgiving one, while Firefox is the conservative one.)

> Larry:
>>   *   I'm not sure why  U+0023 NUMBER SIGN should be disallowed in the characters allowed in the<fragment>  production. Again, same question...
>
> Martin:
>
> I seem to remember something to the effect that some implementations
> parsed URIs from the back to chop off the fragment part.

Ugh. Just say No to such implementations. Always parse forward.

> Larry:
>>   *   I don't understand how current processors handle [] square brackets.  I'm reading
>
> Martin:
> Where did you read that?
>
> Larry:
>> If w begins with either of:
>>     a string matching the<scheme>  production, followed by "://"
>>     the string "//"
>>
>> then percent-encode any left or right square brackets (U+005B, U+005D, "[" and "]") following the first occurrence of "/", "?", or "#" which follows the first occurrence of "//".
>>
>>
>> What the heck is this about?
>
> Martin:
>
> I think the purpose is to %-encode '[' and ']' except for the authority
> part, where they are needed for IPV6. The encoding is done because '['
> and ']' are not allowed elsewhere than in IP-literal.

I don't see why [ and ] should be disallowed in the path and query
parts, but the major browsers currently handle those characters
differently in the path/query. (Some browsers %-encode, others don't.)

Erik

Received on Friday, 25 September 2009 18:23:55 UTC