Re: A7: CDATA, RCDATA, TEMP marked sections? from lee@sq.com on 1996-10-11 (w3c-sgml-wg@w3.org from October 1996)

From: <lee@sq.com>
Date: Thu, 10 Oct 96 23:34:27 EDT
To: Charles@sgmlsource.com
Cc: w3c-sgml-wg@w3.org
Message-Id: <9610110334.AA22878@sqrex.sq.com>
> A design principle of SGML is that different kinds of information have
> different syntaxes.

It's very valuable to hear comments like this -- that sort of principle
wasn't obvious to me from the standard.  Since it's counter to practice in
both programming languages and natural language, perhaps it's not surprising.
We have an opportunity to diverge from this route in XML.

> Processing instructions are different from comments are different from
> elements are different from specially-parsed data.

Nouns are different from verbs, but they are not distinguished lexically in
English.  In Sanskrit and Ancient Greek and Latin they are distinguished by
endings; morphological tagging has tended to weaken with time, however, so
that languages are getting less complex in this regard.

In programming languages, consider
	process(w)
where w may be an integer or a string, but the same syntax is used.

This uniform syntax is VERY highly prized in computer science.

> The syntactic distinctions
> emphasize the semantic ones to remind the user that different rules are in
> effect and different behaviors will apply.

If the rules were used everywhere, the language would be simpler,
no different syntax would be needed, and the user would not get confused.

> The syntactic differences also allow
> SGML to avoid impinging on the user's name space, as it would have to do if,
> say, comments were element types.

It is true that the parser needs to have enough context to know what is
a comment.  E.g.:

<!Comment editnote - - CDATA>
<editnote>this is now a comment.</editnote>

Now there is no syntactic distinction, and yet the DTD-writer's namespace
is unaffected.

It would be more productive to introduce the idea of scope -- e.g.

<!Element ABSTRACT - -
    <!Element Title - -
        (#PCDATA)
    >
    <!Element P - -
        (#PCDATA)
    >
    (title, P*)
>

in order to protect namespaces.  (I am not proposing this syntax here, but
only trying to point out that since SGML has only global names, it's in no
position tobe proud about the user's namespace!)

It would be far better to have a simpler syntax.
There may have been some merit in having a separate comment syntax,
if it had been a good one.
Since the SGML comment syntax has the same open and close (at
least by default) and suffers from non-rubust parity errors, as well
as failing if a typewriter em-dash is included (which can't be escaped),
I don't accept that the comment syntax is a good one.  It's less robust
than the usual element syntax (which is very good, by the way).

If you want to have a syntactical distinction, you could use a similar
syntax to elements, on the principle of least surprises, or you could use
a coment that goes to the end of a line (or record, if you must) as in
many programming languages.   Element-style comments could nest, which
removes some of the need for IGNORED marked sections.

But I expect that XML will have SGML's weird multiple syntaxes... :-(

Lee
Received on Thursday, 10 October 1996 23:34:46 UTC