Re: Bug 7034 from Boris Zbarsky on 2010-03-23 (public-html@w3.org from March 2010)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Mon, 22 Mar 2010 20:16:02 -0400
To: "Tab Atkins Jr." <jackalmage@gmail.com>
CC: HTMLwg WG <public-html@w3.org>
Message-ID: <4BA80842.9010802@mit.edu>

On 3/22/10 6:59 PM, Tab Atkins Jr. wrote:
> On Mon, Mar 22, 2010 at 3:39 PM, Maciej Stachowiak<mjs@apple.com>  wrote:
>> Another possibility is to change parsing such that "...&copy=..." is not a
>> hazard. I believe you had some evidence that this would fix more sites than
>> it would break. It seems like it would also have the benefit of allowing us
>> to make authoring rules more lenient in a beneficial way, without at the
>> same time introducing undue complexity.
>
> If at all possible, this is what I'd prefer.  I've never consciously
> escaped an ampersand in a URL in my life (and luckily don't think that
> I've ever run into a situation where it got interpreted as a named
> entity).  I'd prefer, if possible, to continue avoiding escaping the
> ampersand.  Unicode exists for a reason, after all.  If I want a
> copyright symbol, I can just pop that character itself into the URL.

So as I see it, the options are:

1)  Disallow entity references in all attributes.
2)  Accept the fact that moving text from attribute A to attribute B
     via DOM manipulation may well not give the same results as having
     the text in attribute B to start with.
3)  Allow entity references in all attributes, and require &amp; in
     URIs as needed.

#3 is what we have right now, right?  Is there any indication that #1 is 
safe to do (as in, doesn't break the web)?  If the weirdness of #2 
outweighed by not needing to escape '&' in URIs?

My gut feel is that #1 is not actually a viable option and that one's 
take on #2 depends heavily on the authoring workflow and the extent to 
which pages are scripted...

-Boris

Received on Tuesday, 23 March 2010 00:16:38 UTC