Re: CDATA in elements and attributes

At 23:41 13/04/00 -0400, Christian Smith wrote:
>On Friday, April 14, 2000 at 01:49, tim@paneris.co.uk (TimP) wrote:
>
>> Thankyou, I know that that is a proposed solution, but I think it is
>> very ugly.
>> 
>> I am trying to get an answer to a deeper point.
>> 
>> The definition of CDATA within SGML depends upon whether it is used in
>> the context of an element content definition or an attribute definition.
>> This 'asymetry'[1] has let us into the position where we have to encode
>> VALID urls within HTML.
>> 
>> I want to understand why this asymetry exists and why it is tolerated. I
>> really like SGML, and have used it successfully in a few projects,
>> (though I would not claim to know it in detail), but I cannot persuade
>> my collegues of its benefits whilst it forces the requirement to encode
>> URLs upon them.
>
>I don't know where you picked up this "asymetry" idea or what exactly you
>mean by this but let me try to cover some points here.

http://www.w3.org/TR/1998/REC-html40-19980424/appendix/notes.html#h-B.3.2

B.3.2 Specifying non-HTML data

Script and style data may appear as element content or attribute values. The following sections describe the boundary between HTML markup
and foreign data. 

  Note. The DTD defines script and style data to be CDATA for both element content and attribute values. SGML rules do not allow
  character references in CDATA element content but do allow them in CDATA attribute values. Authors should pay particular attention
  when cutting and pasting script and style data between element content and attribute values.

  This asymmetry also means that when transcoding from a richer to a poorer character encoding, the transcoder cannot simply replace
  unconvertible characters in script or style data with the corresponding numeric character references; it must parse the HTML document
  and know about each script and style language's syntax in order to process the data correctly.


>Now, the definition for the HREF attribute of the A entity states that it
>is CDATA. The definition of CDATA has a number of items and one of these
>is that an & MUST be encoded as & (or its numeric equivalent).

Yes, when the keyword CDATA is used in an attribute definition, 
when it is used in an element definition then it means the opposite, 
vis entities will not be recognised.

This leads to my claim that SGML is broken and as such only usable if one 
hacks the validator.

I do understand how to mutilate my HTML to force it through the validator, 
but this is not acceptable to myself or my collegues. 

I don't think that I can hope to persuade you or the ISO committee that 
this flaw makes SGML unusable, but it has really embarassed me.

yours
timp


Member of http://www.paneris.org/

Received on Friday, 14 April 2000 11:22:31 UTC