W3C home > Mailing lists > Public > www-validator@w3.org > April 2000

Re: CDATA in elements and attributes

From: Christian Smith <csmith@barebones.com>
Date: Thu, 13 Apr 2000 23:41:19 -0400
To: TimP <tim@paneris.co.uk>
cc: W3C Validator <www-validator@w3.org>
Message-ID: <auto-000000377191@barebones.com>
On Friday, April 14, 2000 at 01:49, tim@paneris.co.uk (TimP) wrote:

> Thankyou, I know that that is a proposed solution, but I think it is
> very ugly.
> 
> I am trying to get an answer to a deeper point.
> 
> The definition of CDATA within SGML depends upon whether it is used in
> the context of an element content definition or an attribute definition.
> This 'asymetry'[1] has let us into the position where we have to encode
> VALID urls within HTML.
> 
> I want to understand why this asymetry exists and why it is tolerated. I
> really like SGML, and have used it successfully in a few projects,
> (though I would not claim to know it in detail), but I cannot persuade
> my collegues of its benefits whilst it forces the requirement to encode
> URLs upon them.

I don't know where you picked up this "asymetry" idea or what exactly you
mean by this but let me try to cover some points here.

This is a valid URI

http://www.company.com/cgi-bin/search?foo&bar

Now, the definition for the HREF attribute of the A entity states that it
is CDATA. The definition of CDATA has a number of items and one of these
is that an & MUST be encoded as &amp; (or its numeric equivalent).

The HTML spec also notes in a comment that an HREF takes as URI as it's
value.

But, because the content of an HREF must be CDATA you needs must html
entity encode certain characters if they appear in the URI.

Lets look at some examples.

Example 1:

bad - <a href="http://www.company.com/search?foo/bar">

In the above example the content of the href is CDATA but it is NOT a
valid URI because we have a / which is not being used in its reserved
location and which therefor needs to be URI encoded.

good - <a href="http://www.company.com/search?foo%2Fbar">


Example 2:

bad - <a href="http://www.company.com/search?foo&bar">

In the above example the content of the href is a valid URI but it is NOT
CDATA because we have an & which is not HTML encoded.

good - <a href="http://www.company.com/search?foo&amp;bar">


Perhaps this sheds some light on your confusion. I hope so.

-- 
Christian Smith  |  csmith@barebones.com  |  http://web.barebones.com
PGP Fingerprint  -  60E5 2216 97D2 1D1A B923 F036 00A9 CEC0 D411 FA89
Received on Thursday, 13 April 2000 23:40:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:53 GMT