RE: XML within XML - includes, transcludes, whatever from Laird Popkin on 2000-11-27 (xml-dist-app@w3.org from November 2000)

From: Laird Popkin <laird@io.com>
Date: Mon, 27 Nov 2000 07:24:33 -0500
To: "S. Mike Dierken" <mike@knownow.com>, "Laird Popkin" <laird@pop.mail.rcn.net>, <xml-dist-app@w3.org>
Message-ID: <NEBBKBONALGKCJKFOMICIEDECBAA.laird@io.com>
I don't have any problem with the logical multi-document model; in fact,
it's pretty important not to combine logical documents; the problem is that
physically transporting those multiple logical documents as multiple
files/messages causes all sorts of problems. What we want to do is package
multiple logical documents into a single physical document for transport.

The issue of processing being during or after parsing doesn't affect the
issue I'm raising -- either implementation should be doable on the same
messages. The problem is that when parsing I can't do the following
(simplifying the XML terribly):

<wrapper with wrapper.dtd>
   <header> ... </header>           -- elements defined by wrapper.dtd
   <body>
      <newsml with newsml.dtd>      -- start of new enclosed document
         <headline> ... </headline> -- elements defined by newsml.dtd
         <body> ... </body>
      </newsml>                     -- return to wrapping document context
   </body>
</wrapper>

Where the wrapper.dtd doesn't know about newml, newsml.dtd doesn't know
about the wrapper, and both chunks of XML are validated.

Instead, we have to do either:

<wrapper with wrapper.dtd>
   <header> ... </header>           -- elements defined by wrapper.dtd
   <body>
      <reference to external news story/> -- reference element defined in
wrapper.dtd
   </body>
</wrapper>

And a separate file:

<newsml with newsml.dtd>
   <headline> ... </headline> -- elements defined by newsml.dtd
   <body> ... </body>
</newsml>

Which raises all sorts of access control and synchronization issues, as well
as adding protocol overhead which could be substantial for small chunks of
data.

Or we could do:

<wrapper with wrapper.dtd>
   <header> ... </header>           -- elements defined by wrapper.dtd
   <body>
      <PCDATA[fhjksfyusahjksdhfuiw9huhc790hrw9fdb]]> -- base64 (or
otherwise) encoded stuff
   </body>
</wrapper>

Where the encoded stuff decodes into a newsml news story, as above. This has
the performance drawback that you need to process all of the body in and out
of whatever encoding is required, and (given DOM or SAX) you need to hold
the entire message body in memory at one time in order to pass it to the
second processor. This is bad if the data is large.

To be honest, I can't see any real advantage to XML not being able to keep
track of nested documents (or PCDATA's). Parsers get trivially more complex,
I suppose, but compared to writing a validating parser, implementing a stack
of document and PCDATA contexts is pretty minor. And it would make the XML
Protocol Working Group's work much easier.

-----Original Message-----
From: xml-dist-app-request@w3.org [mailto:xml-dist-app-request@w3.org]On
Behalf Of S. Mike Dierken
Sent: Monday, October 30, 2000 12:49 PM
To: Laird Popkin; xml-dist-app@w3.org
Cc: laird@io.com
Subject: RE: XML within XML - includes, transcludes, whatever




> It's an interesting document, but unless I misread it completely it pretty
> much said that you shouldn't *want* to wrap independent, validated XML
> within validated XML, since SGML, and thus XML, is meant to be used within
> one document with one DTD, and that instead what you should want to do is
> build the wrapped data by extending the wrapping DTD, or by not
> validating.

[from http://www.nyct.net/~aray/notes/wek-namespaces.txt]
"The processing of documents happens *after* parsing. It's no more difficult
to process a set of related documents than it is to process a single
document. Therefore, there's no need to create a single document from
multiple documents *before* parsing.
By doing the combining *after* parsing you avoid all issues of syntactic
combination, including the need to distinguish elements from different name
spaces, because you haven't removed the original document boundaries, which
defined the name space distinctions in the first place."

I don't think Eliot was suggesting 'extending the wrapping DTD'. I think he
was suggesting keeping the original document boundaries, which implies a
multi-document approach rather than a single-document approach.


Mike
Received on Wednesday, 1 November 2000 07:27:25 UTC