- From: Laird Popkin <laird@io.com>
- Date: Mon, 27 Nov 2000 07:24:33 -0500
- To: "S. Mike Dierken" <mike@knownow.com>, "Laird Popkin" <laird@pop.mail.rcn.net>, <xml-dist-app@w3.org>
I don't have any problem with the logical multi-document model; in fact, it's pretty important not to combine logical documents; the problem is that physically transporting those multiple logical documents as multiple files/messages causes all sorts of problems. What we want to do is package multiple logical documents into a single physical document for transport. The issue of processing being during or after parsing doesn't affect the issue I'm raising -- either implementation should be doable on the same messages. The problem is that when parsing I can't do the following (simplifying the XML terribly): <wrapper with wrapper.dtd> <header> ... </header> -- elements defined by wrapper.dtd <body> <newsml with newsml.dtd> -- start of new enclosed document <headline> ... </headline> -- elements defined by newsml.dtd <body> ... </body> </newsml> -- return to wrapping document context </body> </wrapper> Where the wrapper.dtd doesn't know about newml, newsml.dtd doesn't know about the wrapper, and both chunks of XML are validated. Instead, we have to do either: <wrapper with wrapper.dtd> <header> ... </header> -- elements defined by wrapper.dtd <body> <reference to external news story/> -- reference element defined in wrapper.dtd </body> </wrapper> And a separate file: <newsml with newsml.dtd> <headline> ... </headline> -- elements defined by newsml.dtd <body> ... </body> </newsml> Which raises all sorts of access control and synchronization issues, as well as adding protocol overhead which could be substantial for small chunks of data. Or we could do: <wrapper with wrapper.dtd> <header> ... </header> -- elements defined by wrapper.dtd <body> <PCDATA[fhjksfyusahjksdhfuiw9huhc790hrw9fdb]]> -- base64 (or otherwise) encoded stuff </body> </wrapper> Where the encoded stuff decodes into a newsml news story, as above. This has the performance drawback that you need to process all of the body in and out of whatever encoding is required, and (given DOM or SAX) you need to hold the entire message body in memory at one time in order to pass it to the second processor. This is bad if the data is large. To be honest, I can't see any real advantage to XML not being able to keep track of nested documents (or PCDATA's). Parsers get trivially more complex, I suppose, but compared to writing a validating parser, implementing a stack of document and PCDATA contexts is pretty minor. And it would make the XML Protocol Working Group's work much easier. -----Original Message----- From: xml-dist-app-request@w3.org [mailto:xml-dist-app-request@w3.org]On Behalf Of S. Mike Dierken Sent: Monday, October 30, 2000 12:49 PM To: Laird Popkin; xml-dist-app@w3.org Cc: laird@io.com Subject: RE: XML within XML - includes, transcludes, whatever > It's an interesting document, but unless I misread it completely it pretty > much said that you shouldn't *want* to wrap independent, validated XML > within validated XML, since SGML, and thus XML, is meant to be used within > one document with one DTD, and that instead what you should want to do is > build the wrapped data by extending the wrapping DTD, or by not > validating. [from http://www.nyct.net/~aray/notes/wek-namespaces.txt] "The processing of documents happens *after* parsing. It's no more difficult to process a set of related documents than it is to process a single document. Therefore, there's no need to create a single document from multiple documents *before* parsing. By doing the combining *after* parsing you avoid all issues of syntactic combination, including the need to distinguish elements from different name spaces, because you haven't removed the original document boundaries, which defined the name space distinctions in the first place." I don't think Eliot was suggesting 'extending the wrapping DTD'. I think he was suggesting keeping the original document boundaries, which implies a multi-document approach rather than a single-document approach. Mike
Received on Wednesday, 1 November 2000 07:27:25 UTC