W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > October 1996

Re: A7: CDATA, RCDATA, TEMP marked sections?

From: Peter Sharpe <peter@sqwest.bc.ca>
Date: Tue, 8 Oct 1996 17:42:45 -0700
Message-Id: <9610081742.ZM2148@west.sq.com>
To: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Cc: w3c-sgml-wg@w3.org
On Oct 4,  5:36pm, Paul Prescod wrote:
> >A.7 Should XML have CDATA, RCDATA, and TEMP marked sections or not?
> It would be really handy to have some mechanism, to allow arbitrary non-SGML
> data (in the same character encoding).
I agree.
Javascript, VBScript, Denali, <anybody's scripting language> invariably
contains what an SGML parser would believe to be markup.

There are several requirements for the mechanism by which the markup is
1. It has to be simple and intuitive.
   I strongly believe that CDATA marked sections violate this requirement.
   HTML authors are used to having both syntax and symantics for their
   markup. If they use SCRIPT, I believe they would naturally expect the
   "parser" to understand that it should ignore everything until it sees
   "</SCRIPT>". To have to add additional markup would neither be intuitive
   nor welcomed.
2. It has to be easily parseable by a simple lexer on the server side.
   There is no reason to expect that the server-side application will be
   using an XML parser.
   While it is not rocket science to detect the marked section start and
   end, it certainly complicates things.

I do not believe that there is an acceptable solution to these requirements
using SGML. The choices are very few: CDATA elements, CDATA marked sections
and "structured comments". CDATA elements fail to hide markup which looks
like end-tags. CDATA marked sections are too much of a burden. And
"structured comments"...well, that's the worst kind of hack, in my opinion.

I do believe that there is a fairly simple solution that would cover almost
all cases, and the cases it doesn't cover would be obvious to the author:
Proposal: The only markup which terminates the content of a CDATA element
is an end-tag that matches the element's start-tag. For example, the only
markup that would end a SCRIPT element would be "</SCRIPT>".

In the case where there is no DTD, there either would be no possibility of
CDATA elements or else there would be some alternate way to indicate the
content type.

And, finally, I would propose that some term other than CDATA be used to
describe these elements.


PS These comments relate to A10 as well. And there are implications for

Peter Sharpe, Chief Scientist, SoftQuad Inc.  Tel: +1 604 585 1999 ext. 312
#108-10070 King George Highway, Surrey, B.C., CANADA V3T 2W4  Fax: 585 1926
Internet: peter@sq.com or peter@sqwest.bc.ca     World Wide Web: www.sq.com
Received on Tuesday, 8 October 1996 20:59:04 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:04 UTC