Re: SGML and XML

At 3:16 PM 12/2/96, Michael Sperberg-McQueen wrote:
>On Mon, 2 Dec 1996 16:02:35 -0500 David G. Durand said:
>>First, we have a compriomise on this issue that is far from perfect,
>>but took us at least a month of intense debate to reach: why are we
>>re-opening it? Let's leave not-so-well-enough alone! I'm wistful for
>>a better (different) solution, but given that everyone (including me)
>>seems to find the compromise livable, shouldn't we leave it in place?
>I think some people are interested in re-opening it because Charles
>seems to have found a way to get RE Delenda Est behavior out of SGML
>processors.  At least, it's the behavior I thought was intended for
>REDE:  record ends are always passed through in mixed content, and
>all white space is eaten in element content.  Since many people gave
>up on REDE solely for compatibility reasons, Charles's new proposal
>seems to open the possibility of getting something materially better
>than the compromise we have now.
Well, that's not what I intended. I have always advocated a first-line
approach that says that _all_ characters not in a tag or a comment are
significant, and part of the parse tree. This is a _simple rule_. Anything
else is a complicated rule, although I agree that the behavior Charles is
claiming would provide a much-less complex rule. But I don't like the fact
that we would have different answers about the contents of the document
depending on the processor. I suppose it's no worse than the current mess.

OK. You've convinced me. Now can someone check to see if the standard is
unabiguous enought about shortrefs and whitespace handling that this hack
will actually work? At least sgmls, nsgmls, and some of the commercial
products need to be checked.

>>This is far too complex: two kinds of parsing are enough.
>Which two do you choose?
>I suggested we distinguish the three concepts of validatiion,
>generation of correct ESIS, and minimal parsing because we seem to
>be using all three of these distinct concepts while pretending to
>ourselves that there are only two concepts involved.  If one of
>these three is unnecessary, by all means tell us which it is so we
>can do without it.

We should still have two kinds; I think three is too many. We should simply
abandon the notion of ESIS-equivalent parsing. We have minimal parsing
(different ESIS), and validating. If people find that they want only some
of the functionality of a validating parser (e.g. whitespace handling only)
they will implement it into their minimal parser, but we need not have a
formal compliance level for that. We can delcare that implementing some
validation features in a minimal parser is _not_ an error in conformance.
This is very non-ISO, but quite IETF in intent, I think.

   -- David

  Can we start interpreting "RE Delenda est" as a statement, rather than en

I am not a number. I am an undefined character.
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________