Re: CDATA in elements and attributes from Terje Bless on 2000-04-15 (www-validator@w3.org from April 2000)

From: Terje Bless <link@tss.no>
Date: Sat, 15 Apr 2000 12:08:04 +0200
To: W3C Validator <www-validator@w3.org>
cc: TimP <tim@paneris.co.uk>
Message-ID: <20000415121523-f01010601-9a0f4246@10.0.0.3>

[ I'll reply to this instead of your reply to me. It's more accurate. ]

On 14.04.00 at 16:22, TimP <tim@paneris.co.uk> wrote:

>http://www.w3.org/TR/1998/REC-html40-19980424/appendix/notes.html#h-B.3.2
>
>B.3.2 Specifying non-HTML data
[...]
> Note. The DTD defines script and style data to be CDATA for both element
> content and attribute values. SGML rules do not allow character
> references in CDATA element content but do allow them in CDATA attribute
> values.

Right! Now we're on the same page here.

You are correct that the rules for element content and attribute values are
different (I thought you were talking about PCDATA). You may even be
correct that this is a bad thing (I'm no SGML expert so I wouldn't know).
But look at it this way: you're getting your panties all in a bunch because
you have to write "&amp;" instead of "&" in URLs in your HTML files. Why is
this such a problem?

If it were truely SGML you were working with, then maybe I would understand
it, but it's not from what you write. You are just using HTML for
publishing on the web or a CD or whatever. Sure it's inconvenient and badly
thought out (the CGI spec anyway), but it's a really minor issue. Hardly
one worth making a fuss about, in my book anyway.

Why is it such a problem?

>Yes, when the keyword CDATA is used in an attribute definition, when it is
>used in an element definition then it means the opposite, vis entities
>will not be recognised.

Not really. It just allows for interpretation of character enteties in
attributes where you would normally expect them to not be allowed and
certainly not necessary.

>This leads to my claim that SGML is broken

That's kinda overdoig it, wouldn't you say? I'm sure SGML is "broken" in
many interesting ways, but condemning the whole system because one small
aspect of it is not how you expected it to be?!?!

>and as such only usable if one hacks the validator.

After which it wouldn't be a HTML validator any more. It'd be a "TimP's
Markup Language Validator". BTW, the validator as such doesn't actually do
any of this processing. We rely on James Clark's excellent SGML processor
SP for that.

>I do understand how to mutilate my HTML to force it through the validator, 
>but this is not acceptable to myself or my collegues. 

Why?

>I don't think that I can hope to persuade you or the ISO committee that 
>this flaw makes SGML unusable, but it has really embarassed me.

Us? We're pretty much irrelevant to this issue. It's ISO you'd need to
convince. Maybe you'd be better off writing in XML anyway. AFAIC it has
really nice and strict rules for what is allowed and what isn't. The stated
goal of XML was to get rid of as much as possible of the cruft from SGML.

Received on Saturday, 15 April 2000 06:18:02 UTC