- From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
- Date: Tue, 8 Oct 1996 21:13:21 -0400 (EDT)
- To: peter@sqwest.bc.ca (Peter Sharpe)
- Cc: w3c-sgml-wg@w3.org
> On Oct 4, 5:36pm, Paul Prescod wrote: > > >A.7 Should XML have CDATA, RCDATA, and TEMP marked sections or not? > > > > It would be really handy to have some mechanism, to allow arbitrary non-SGML > > data (in the same character encoding). > > > There are several requirements for the mechanism by which the markup is > escaped: > 1. It has to be simple and intuitive. > I strongly believe that CDATA marked sections violate this requirement. I have to disagree. The hardwired CDATA MS syntax somebody (Michael?) proposed a few days ago is not that non-intuitive. I originally had an idea like yours, but have come to embrace that convention. > HTML authors are used to having both syntax and symantics for their > markup. If they use SCRIPT, I believe they would naturally expect the > "parser" to understand that it should ignore everything until it sees > "</SCRIPT>". To have to add additional markup would neither be intuitive > nor welcomed. No, but they are already going to have to learn to "shape up" to move into the XML world. If you look at it in a certain way, you might consider it _easier_ for both the author and the parser to have a SINGLE syntax for turning on and off CDATA content instead of a potentially infinite list (<SCRIPT>, <CODE>, <STYLE>, ... ). I don't think that hard-coded CDATA marked sections are harder to understand or to parse. And what if a user wanted to include some "SGML code" in the middle of a paragraph or somewhere else in an element that is not CDATA? I think that a modeless, always-available mechanism for marking CDATA content is preferable to a DTD specific one. I also think that documents encoded in this manner are more robust than those that depend on CDATA declarations in a DTD that may or may not be available and may or may not change. I am strongly in favour of de-emphasizing DTDs and application conventions in the reliable parsing of XML documents. (which is why I strongly oppose RE proposals that "leave it up to the application"). > I do not believe that there is an acceptable solution to these requirements > using SGML. The choices are very few: CDATA elements, CDATA marked sections > and "structured comments". CDATA elements fail to hide markup which looks > like end-tags. CDATA marked sections are too much of a burden. And > "structured comments"...well, that's the worst kind of hack, in my opinion. > > I do believe that there is a fairly simple solution that would cover almost > all cases, and the cases it doesn't cover would be obvious to the author: > Proposal: The only markup which terminates the content of a CDATA element > is an end-tag that matches the element's start-tag. For example, the only > markup that would end a SCRIPT element would be "</SCRIPT>". > > In the case where there is no DTD, there either would be no possibility of > CDATA elements or else there would be some alternate way to indicate the > content type. I think that our design should presume the non-availability of a DTD as the "norm". So we should specify that "alternate way", and I suspect it will turn out to be hard-coded CDATA marked sections. I had the original non-SGML compatible proposal in this area (and it was similar to yours). I now feel that we should settle for a reasonable, workable SGML-compatible compromise: CDATA marked sections. Perhaps for SGML 97 we can get soething more flexible (i.e. something that would allow us to embed ]]). Paul Prescod
Received on Tuesday, 8 October 1996 21:14:29 UTC