Re: RS/RE, again (sorry) from Paul Grosso on 1996-12-17 (w3c-sgml-wg@w3.org from December 1996)

From: Paul Grosso <paul@arbortext.com>
Date: Tue, 17 Dec 96 10:05:37 CST
To: w3c-sgml-wg@w3.org
Message-Id: <9612171605.AA14532@atiaus.arbortext.com>
> From: bosak@atlantic-83.eng.sun.com (Jon Bosak)
> 
> [Chris Maden:]
> 
> | 3) A dichotomy between "DTD-ful" and DTD-less parsing will make any
> |    sibling-based relationship difficult at best; this will affect some
> |    TEI or HyQ based hyperlinks, as well as sibling-based stylistic
> |    decisions.
> 
> Sorry to be so slow here, but what's the connection with sibling
> relationships?  My idea of a well-formed XML document is one for which
> there is just one possible tree structure; 

Precisely what I would hope too.  That was one of the reasons behind
my earlier posting on this, part of which you quote below.

>                                            what's different about
> sibling relationships if a DTD is provided?

If whitespace is significant (i.e., contributes to the grove) in one
case (e.g., whitespace in element content that is not known to be
within element content when the DTD is not available and is therefore
considered to be significant) and not in the other (i.e., when the
DTD indicates the whitespace is in element content), then you will
have what HyTime considers to be pseudo-elements in the first case
and not in the second.

For example, consider:

<book><section><p>Paragraph one.</p> <p>Paragraph two.</p></section></book>

If <section> is known to have element content, then the <p> element
whose content is "Paragraph two." is the second child of <section>
whereas if the section has mixed content, that same <p> is the third
child of <section>.  A HyTime treeloc of 1 1 2 that should properly
address the "Paragraph two." <p> element when <section> is known to
have element content would instead address the " " pseudo-element
when <section> is assumed to have mixed content.


> From: bosak@atlantic-83.eng.sun.com (Jon Bosak)
> 
> [Paul Grosso:]
> 
> | If we have defined the concept of well-formed XML precisely so that we
> | can deal with XML instances without DTDs, then I suggest we refine the
> | definition of well-formedness to include what we might call
> | "normalized whitespacing." An XML document is well-formed (and
> | therefore can be properly processed without reference to a DTD) *only*
> | if it contains no (non-markup) whitespace that would be insignificant
> | if it were parsed with reference to its DTD.
> 
> I rather like this idea, but what do you mean by "its DTD"?  There are
> an infinite number of candidates.

By "its DTD" I really mean by any one of its infinite possible DTDs.

In particular, a well-formed doc could never have whitespace that
could be significant in any possible DTD and insignificant in any
other possible DTD.  It could only have whitespace that would always
significant (e.g., in between two words) or that would always be
insignificant (e.g., an RE immediately following a start tag or
immediately preceding an end tag).
Received on Tuesday, 17 December 1996 11:09:58 UTC