Re: aftermath on Bug 8953 – URL decomp. IDL attributes when parsing fails

On 18.02.2010 23:23, Maciej Stachowiak wrote:
>
> On Feb 18, 2010, at 2:06 PM, Julian Reschke wrote:
>
>>
>> In this case it would mean removing the special case in step 3 of
>> Section 2 of <http://www.w3.org/html/wg/href/draft>. So, instead of:
>>
>> "If w begins with either of:
>>
>> * a string matching the <scheme> production, followed by "://"
>> * the string "//"
>>
>> then percent-encode any left or right square brackets (U+005B, U+005D,
>> "[" and "]") following the first occurrence of "/", "?", or "#" which
>> follows the first occurrence of "//".
>>
>> Otherwise, percent-encode all left and right square brackets."
>>
>> it would simply be:
>>
>> "Percent-encode all left and right square brackets."
>
> I believe percent-encoding all square brackets will break processing of
> web addresses with an IPv6 IP address as the hostname. It needs to at
> minimum not percent-escape them when they delimit the allowed syntax for
> a URI authority IPv6 address.

The percent-escaping is undone in step 6.

> I suspect the URL you mentioned fails only as an accidental side effect
> of trying to handle IPv6 addresses correctly.

Potentially. I'm trying to find out why that special case is there, and 
whether it's really needed. After all, we were told "this is how things 
work in reality".

As far as I can tell so far, all this double-escaping and un-escaping 
mess can be substituted by either

- using the RFC 3986 regexp for parsing 
(<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B>), or

- by expanding the set of allowable characters, as proposed in 
<http://tools.ietf.org/html/draft-ietf-iri-3987bis-00#section-7.2>.

  Best regards, Julian

Received on Thursday, 18 February 2010 22:35:24 UTC