- From: Peter Sharpe <peter@sqwest.bc.ca>
- Date: Tue, 8 Oct 1996 17:42:45 -0700
- To: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Cc: w3c-sgml-wg@w3.org
On Oct 4, 5:36pm, Paul Prescod wrote: > >A.7 Should XML have CDATA, RCDATA, and TEMP marked sections or not? > > It would be really handy to have some mechanism, to allow arbitrary non-SGML > data (in the same character encoding). > I agree. Javascript, VBScript, Denali, <anybody's scripting language> invariably contains what an SGML parser would believe to be markup. There are several requirements for the mechanism by which the markup is escaped: 1. It has to be simple and intuitive. I strongly believe that CDATA marked sections violate this requirement. HTML authors are used to having both syntax and symantics for their markup. If they use SCRIPT, I believe they would naturally expect the "parser" to understand that it should ignore everything until it sees "</SCRIPT>". To have to add additional markup would neither be intuitive nor welcomed. 2. It has to be easily parseable by a simple lexer on the server side. There is no reason to expect that the server-side application will be using an XML parser. While it is not rocket science to detect the marked section start and end, it certainly complicates things. I do not believe that there is an acceptable solution to these requirements using SGML. The choices are very few: CDATA elements, CDATA marked sections and "structured comments". CDATA elements fail to hide markup which looks like end-tags. CDATA marked sections are too much of a burden. And "structured comments"...well, that's the worst kind of hack, in my opinion. I do believe that there is a fairly simple solution that would cover almost all cases, and the cases it doesn't cover would be obvious to the author: Proposal: The only markup which terminates the content of a CDATA element is an end-tag that matches the element's start-tag. For example, the only markup that would end a SCRIPT element would be "</SCRIPT>". In the case where there is no DTD, there either would be no possibility of CDATA elements or else there would be some alternate way to indicate the content type. And, finally, I would propose that some term other than CDATA be used to describe these elements. Peter PS These comments relate to A10 as well. And there are implications for A9. -- Peter Sharpe, Chief Scientist, SoftQuad Inc. Tel: +1 604 585 1999 ext. 312 #108-10070 King George Highway, Surrey, B.C., CANADA V3T 2W4 Fax: 585 1926 Internet: peter@sq.com or peter@sqwest.bc.ca World Wide Web: www.sq.com
Received on Tuesday, 8 October 1996 20:59:04 UTC