Re: aftermath on Bug 8953 – URL decomp. IDL attributes when parsing fails from Maciej Stachowiak on 2010-02-18 (public-html@w3.org from February 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Thu, 18 Feb 2010 14:46:43 -0800
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Ian Hickson <ian@hixie.ch>, "public-html@w3.org" <public-html@w3.org>
Message-id: <A1411637-24E8-4706-8A91-9B78113F3E08@apple.com>

On Feb 18, 2010, at 2:34 PM, Julian Reschke wrote:

> On 18.02.2010 23:23, Maciej Stachowiak wrote:
>
>> I suspect the URL you mentioned fails only as an accidental side  
>> effect
>> of trying to handle IPv6 addresses correctly.
>
> Potentially. I'm trying to find out why that special case is there,  
> and whether it's really needed. After all, we were told "this is how  
> things work in reality".
>
> As far as I can tell so far, all this double-escaping and un- 
> escaping mess can be substituted by either
>
> - using the RFC 3986 regexp for parsing (<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B 
> >), or
>
> - by expanding the set of allowable characters, as proposed in <http://tools.ietf.org/html/draft-ietf-iri-3987bis-00#section-7.2 
> >.

I think all the escaping and unescaping is there solely so these  
algorithms could be written as a layer on top of previous IRI/URI  
RFCs. I believe it would be better for IRIbis to define error-tolerant  
Web address parsing directly, rather than via escaping and then  
applying another algorithm. The regexp looks reasonable to me but I am  
not sure if there are mysterious edge cases.

Regards,
Maciej

Received on Thursday, 18 February 2010 22:47:19 UTC