Re: A7: CDATA, RCDATA, TEMP marked sections? from Harvey Bingham on 1996-10-09 (w3c-sgml-wg@w3.org from October 1996)

From: Harvey Bingham <hbingham@ACM.org>
Date: Wed, 09 Oct 1996 12:08:45 -0400
To: "Peter Sharpe" <peter@sqwest.bc.ca>
Cc: w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19961009160845.00692c70@tiac.net>

At 19:15 1996-10-08 -0700, Peter Sharpe wrote:
>On Oct 9,  1:44am, Charles F. Goldfarb wrote:
>> Here's how I would tell users to address these requirements using DTD-less
>> XML:
>>
>> Chapter 5. The CLEARDATA Tag.
>>
>> When you need to put scripts or other data in your document that isn't SGML,
>> you mark it with special tags called CLEARDATA tags (CDATA for short).
>>
>> The CLEARDATA start-tag is: <![CDATA[
>> The CLEARDATA end-tag is: ]]>
>>
>An elegant solution.

I note this solution allows no attributes on CLEARDATA. 
>
>However, it doesn't solve the correct problem. I'm not so much talking about
>the case where you just want to escape some characters, but where you also
>want to label those characters. No semantic information other than "this
>is clear text" can be attached to the CLEARDATA start-tag (sic). So you need
>additional markup. This means that you are asking the HTML author, for
>example, to use markup like
>
>  <SCRIPT><![CDATA[
>	...
>	var1 = "<EM>Hello world</EM>"
>	...
>  ]]></SCRIPT>
>
>My point is that that kind of markup will be non-intuitive and considered
>completely unnecessary by the author.
>
I agree. A tool could hack the excess markup into place, if and when XML
would move it back into SGML.

>I might also note that Appendix B of the standard (B.13.1.1, the paragraph
>starting at 29) says that CDATA content is only ended by an end-tag which
>matches the start-tag (or ancestor!?). So the SGML committee must have
>considered that a good idea at one time. (Or the author of the appendix had an
>intuitive, but incorrect, understanding of CDATA declared content.) If there
>was a good reason why the committee changed its mind about that, I'd be
>very interested to hear it.
>
Annex B does not form an integral part of 8879.
The relevant specification of that unfortunate behavior is found in 7.6, 
lines 6-11.

"The content of an element declared as character data or replaceable
character data is terminated only by an etago delimiter-in-context (which
need not open a valid end-tag) or a valid net. Such termination is an error
if it would have been an error had the content been mixed content.

Seems to me that the termination on first occurring etago, then
checking to see if it is a proper termination else error is less desirable 
than to require the proper matching endtag. 

Regards. Harvey Bingham

Received on Wednesday, 9 October 1996 12:10:47 UTC