SGML Entities or Web-style objects in XML? from Paul Prescod on 1996-09-21 (w3c-sgml-wg@w3.org from September 1996)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Sat, 21 Sep 1996 16:13:47 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <1.5.4.32.19960921201347.00c28b90@csclub.uwaterloo.ca>

Before we get too far into a discussion of entities, I'd like to take a
second to reflect on XML's dual parentage. (risking Len's ire =) ) 

  The SGML Way:

In the SGML world, entity referncing and document parsing are intertwined,
though separate. A document cannot be considered "valid" until its entities
have been resolved (especially its external text entities!. Entities are
declared (and theoretically resolved) in the DTD or DTD subset by the entity
manager and used by the parser to create the ESIS. In all but a few SGML
systems, all document content reuse is done through this mechanism.

  The Web Way:

In the Web world (especially in the HTML world), entities (or "objects")
are, conversely, resolved by the application AFTER parsing. So, the parser
parses the document and returns the ESIS to the application. The application
starts to process it and fetches (perhaps through an entity manager) any
objects it needs to complete that task. In reality, "transclusion" or
"inclusion" or "fragment inclusion" is just a special case of linking. The
standardized mechanism for including HTML content is the same as for
including JAVA or Active-X content: <OBJECT>. In other words, in HTML,
document transcusion is just a special case of linking. The parser doesn't
know or care about it.

  The Heresy:

Do we really need parse-time entities in XML? What do they "buy?" In a
networked environment, the decision to resolve entities or not should be
entirely left up to the application (not the parser) because only the
application knows which entities it "needs", and the cost of resolving
entities it does not need is quite high. Furthermore, since the number of
entity-resolution failures will be quite high (relatively speaking) going
over the Internet, the application should be able to choose which entities
are strictly need and which may be absent.

In other words, I am proposing that we should not let entities affect the
parsing of their containing documents. Every XML document would be validated
without regard to the content or existance of its sub-documents. Further,
the content of sub-documents should not affect the parse-tree of the parent
document in any way. They would all be "opaque" to the containing document
at the parser level.

At the application level, of course, it might be invalid to transclude a
footnote as if it were a chapter, but that's the same as if you hyperlinked
to a chapter as if it were a footnote. But HyTime (and XTime =) ) has/will
have facilities for specifying those constraints at the "link manager" level.

If we can agree that the parser doesn't need to care about
fragments/entities/objects, then we can wait until the spring to talk about
them.

 Paul Prescod

Received on Saturday, 21 September 1996 16:18:42 UTC