- From: olivier Thereaux <ot@w3.org>
- Date: Mon, 27 Nov 2006 15:58:15 +0900
- To: neal.p.murphy@alum.wpi.edu
- Cc: www-validator@w3.org
On Nov 27, 2006, at 14:13 , Neal Murphy wrote: > Why does the use of plain & in URLs need to be fixed? Can you give > me one or > more solid reasons why a browser should change an & that it knows > is inside > an URL? The simple reason why ampersands should be encoded is that the type of data for the value of href attributes is CDATA. CDATA can have entities (e.g é) and the ampersand is the delimiter for the beginning of an entity, which is why when you want to just write "&" you need to write it as "&" See also http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2 But let's go a little further and ask ourselves "why should entities be allowed in href?" The fact is, uris are not just "plain ascii", we have IRIs now, and the following is perfectly legit: http://éxample.com/foo/bar/ which can be written in an HTML link as <a href="http://éxample.com/foo/bar/">my link</a> And I don't want my HTML user-agent to think I'm linking to http://éxample.com/foo/bar/ but to http://éxample.com/foo/bar/ Hence, a use case for entities in href attributes values and therefore the need to escape ampersands. Note that I am not saying this was what people who built HTML had in mind when deciding to use CDATA as the type for this particular attribute value, I am just saying that by thinking for a few minutes, I have found a use case that shows me the decision still makes sense today. regards, -- olivier
Received on Monday, 27 November 2006 06:58:28 UTC