Re: aftermath on Bug 8953 – URL decomp. IDL attributes when parsing fails

On Feb 18, 2010, at 2:34 PM, Julian Reschke wrote:

> On 18.02.2010 23:23, Maciej Stachowiak wrote:
>
>> I suspect the URL you mentioned fails only as an accidental side  
>> effect
>> of trying to handle IPv6 addresses correctly.
>
> Potentially. I'm trying to find out why that special case is there,  
> and whether it's really needed. After all, we were told "this is how  
> things work in reality".
>
> As far as I can tell so far, all this double-escaping and un- 
> escaping mess can be substituted by either
>
> - using the RFC 3986 regexp for parsing (<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B 
> >), or
>
> - by expanding the set of allowable characters, as proposed in <http://tools.ietf.org/html/draft-ietf-iri-3987bis-00#section-7.2 
> >.

I think all the escaping and unescaping is there solely so these  
algorithms could be written as a layer on top of previous IRI/URI  
RFCs. I believe it would be better for IRIbis to define error-tolerant  
Web address parsing directly, rather than via escaping and then  
applying another algorithm. The regexp looks reasonable to me but I am  
not sure if there are mysterious edge cases.

Regards,
Maciej

Received on Thursday, 18 February 2010 22:47:19 UTC