Re: [VE][325] Error Message Feedback from olivier Thereaux on 2006-11-27 (www-validator@w3.org from November 2006)

From: olivier Thereaux <ot@w3.org>
Date: Mon, 27 Nov 2006 15:58:15 +0900
To: neal.p.murphy@alum.wpi.edu
Cc: www-validator@w3.org
Message-Id: <33984906-D0DC-4882-959A-1E884DD24CCF@w3.org>

On Nov 27, 2006, at 14:13 , Neal Murphy wrote:
> Why does the use of plain & in URLs need to be fixed? Can you give  
> me one or
> more solid reasons why a browser should change an & that it knows  
> is inside
> an URL?

The simple reason why ampersands should be encoded is that the type  
of data for the value of href attributes is CDATA. CDATA can have  
entities (e.g &eacute;) and the ampersand is the delimiter for the  
beginning of an entity, which is why when you want to just write "&"  
you need to write it as "&amp;"
See also http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2

But let's go a little further and ask ourselves "why should entities  
be allowed in href?"
The fact is, uris are not just "plain ascii", we have IRIs now, and  
the following is perfectly legit:
http://éxample.com/foo/bar/
which can be written in an HTML link as
<a href="http://&eacute;xample.com/foo/bar/">my link</a>

And I don't want my HTML user-agent to think I'm linking to
http://&eacute;xample.com/foo/bar/
but to
http://éxample.com/foo/bar/

Hence, a use case for entities in href attributes values and  
therefore the need to escape ampersands.

Note that I am not saying this was what people who built HTML had in  
mind when deciding to use CDATA as the type for this particular  
attribute value, I am just saying that by thinking for a few minutes,  
I have found a use case that shows me the decision still makes sense  
today.

regards,
-- 
olivier

Received on Monday, 27 November 2006 06:58:28 UTC