Re: [Bug 6670] Allow unescaped &s, at least in attributes that accept URLs

Ian Hickson wrote:
> On Fri, 29 May 2009, Julian Reschke wrote:
>> Ian Hickson wrote:
>>> On Fri, 29 May 2009, Julian Reschke wrote:
>>>>> I've tried to do this. The spec text for this is highly unintuitive, but
>>>>> I hope it matches practical intuition more than the previous text. I'm
>>>>> not compeltely convinced that this is a good idea, so let me know if you
>>>>> think this should be changed back.
>>>> If this mean that it's not conforming: -1!
>>> I don't understand what you mean.
>> I meant: "If this mean that it becomes conforming (= "valid"): -1".
> 
> What exactly do you mean by "it" and what do you have any technical 
> objections other than negative numbers?

You are making something "valid" which makes parsing href attributes 
significantly harder, and which increases the risk of these kinds of 
references leaking into other formats.

The only reason you have given was hearsay about reducing the size of 
certain documents; I have mentioned other ways that reduce the size more 
significantly, one of which could be deployed right away.

>>>> There are better ways to achieve shorter href attributes, such as 
>>>> using ";" instead of "&" as query parameter delimiter.
>>> Unfortunately we are stuck with "&" for systems that use form 
>>> submission.
>> How many of the escaped ampersands in the response to a Google Search 
>> are contained in href attributes that point to Google services? Fix 
>> those to accept ";" instead of "&", and replace them in the links.
> 
> Supporting both '&' and ';' seems like a exercise in bug creation. Parsing 
> URIs is hard enough to do right as it is without making things even more 
> complicated and adding even more edge cases.

But that's exactly what you are doing, except here it applies to parsing 
href attributes, not URIs.

>> Furthermore, there are other means to optimize for size, such as using 
>> gzip (which saves ~75% on the response I just tried), or using a 
>> client-side include mechanism for static parts of the page (this was 
>> proposed several times in the past, and rejected as not needed).
> 
> And once those have been done, you can still save more bytes by not 
> including the "amp;" each time.

I just tried the search result page for "html5".

Original size: 29722 (gzipped 7753 - 26%)
after replacing all "&amp" by "&": 29518 - 99% (gzipped 7755 - 26%)

So the win by replacing all & instances was 1%, not 7%, but gzip 
reduces the size by 3/4. Also note that applying gzip to the "optimized" 
version in this particular case yielded a *longer* result.

> But that's academic; this is one of the most common syntax errors and 
> there really is no reason why it should be one, as far as I can tell. Why 
> would we want to make this a parse error?

See above.

BR, Julian

Received on Monday, 1 June 2009 10:00:36 UTC