Re: A7: CDATA, RCDATA, TEMP marked sections?
> On Oct 4, 5:36pm, Paul Prescod wrote:
> > >A.7 Should XML have CDATA, RCDATA, and TEMP marked sections or not?
> > It would be really handy to have some mechanism, to allow arbitrary non-SGML
> > data (in the same character encoding).
> There are several requirements for the mechanism by which the markup is
> 1. It has to be simple and intuitive.
> I strongly believe that CDATA marked sections violate this requirement.
I have to disagree. The hardwired CDATA MS syntax somebody (Michael?) proposed
a few days ago is not that non-intuitive. I originally had an idea like yours,
but have come to embrace that convention.
> HTML authors are used to having both syntax and symantics for their
> markup. If they use SCRIPT, I believe they would naturally expect the
> "parser" to understand that it should ignore everything until it sees
> "</SCRIPT>". To have to add additional markup would neither be intuitive
> nor welcomed.
No, but they are already going to have to learn to "shape up" to move into
the XML world. If you look at it in a certain way, you might consider it
_easier_ for both the author and the parser to have a SINGLE syntax for
turning on and off CDATA content instead of a potentially infinite list
(<SCRIPT>, <CODE>, <STYLE>, ... ). I don't think that hard-coded CDATA
marked sections are harder to understand or to parse.
And what if a user wanted to include some "SGML code" in the middle of
a paragraph or somewhere else in an element that is not CDATA? I think that
a modeless, always-available mechanism for marking CDATA content is
preferable to a DTD specific one.
I also think that documents encoded in this manner are more robust than
those that depend on CDATA declarations in a DTD that may or may not be
available and may or may not change. I am strongly in favour of
de-emphasizing DTDs and application conventions in the reliable parsing of
XML documents. (which is why I strongly oppose RE proposals that "leave
it up to the application").
> I do not believe that there is an acceptable solution to these requirements
> using SGML. The choices are very few: CDATA elements, CDATA marked sections
> and "structured comments". CDATA elements fail to hide markup which looks
> like end-tags. CDATA marked sections are too much of a burden. And
> "structured comments"...well, that's the worst kind of hack, in my opinion.
> I do believe that there is a fairly simple solution that would cover almost
> all cases, and the cases it doesn't cover would be obvious to the author:
> Proposal: The only markup which terminates the content of a CDATA element
> is an end-tag that matches the element's start-tag. For example, the only
> markup that would end a SCRIPT element would be "</SCRIPT>".
> In the case where there is no DTD, there either would be no possibility of
> CDATA elements or else there would be some alternate way to indicate the
> content type.
I think that our design should presume the non-availability of a DTD as the
"norm". So we should specify that "alternate way", and I suspect it will
turn out to be hard-coded CDATA marked sections.
I had the original non-SGML compatible proposal in this area (and it was
similar to yours). I now feel that we should settle for a reasonable, workable
SGML-compatible compromise: CDATA marked sections. Perhaps for SGML 97 we
can get soething more flexible (i.e. something that would allow us to