- From: Norman Walsh <Norman.Walsh@Sun.COM>
- Date: Thu, 16 Feb 2006 09:48:33 -0500
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <87r763z7em.fsf@nwalsh.com>
I've been trying to think of a way to simplify our underlying processing model. I'd like to avoid the whole notion of dependencies and backwards/forwards chaining, etc., if possible. Here's my current idea. Imagine that we define a component in the system call the "pool manager". The pool manager's job is to provide infosets. You hand it a URI and it returns an XML infoset. We say nothing about how it builds the infoset; if it turns CSV files into XML, more power to it. All components get their infosets from the pool manager. The pool manager has one distinguished infoset, the anonymous infoset. The initial value of this anonymous infoset is implementation dependent. Each component can consume the anonymous infoset and produce (exactly) one new anonymous infoset, which will become the pool manager's anonymous infoset after it finishes. The anonymous infoset acts like stdin/stdout, basically. When the pipeline finishes, what the pool manager does with the anonymous infoset its left with is implementation dependent. Components can naturally consume and produce other things as well, but all but one of them must be named with URIs. The components are responsible for notifying the pool manager about any new infosets that they create (if those infosets are expected to be available for subsequent processing, which I imagine to be the normal case). Steps in the pipeline are processed in document order. If we allow sub-pipelines, we'll have to talk about the nature of processing in that case. And if we allow iteration or conditional processing, we'll have to outline that too. But the basic idea is document order. If a processor is smart and can work out better arrangements, fine, but the results must be as if the stages had been processed in document order. So here's a valid pipeline: <p:pipeline> <p:stage name="validate"/> <p:stage name="xinclude"/> <p:stage name="validate"/> </p:pipeline> Validation takes the anonymous infoset and produces a new one. Ditto XInclude. So this pipeline performs validation, xinclude, and validation on the anonymous infoset and produces an anonymous infoset. Here's another pipeline: <p:pipeline> <p:stage name="validate"> <p:input href="someURI"/> </p:stage> <p:stage name="xinclude"/> <p:stage name="validate"/> </p:pipeline> It doesn't consume the intial input, it starts with someURI. And another <p:pipeline> <p:stage name="validate"/> <p:stage name="xinclude"/> <p:stage name="validate"> <p:output href="someOtherURI"/> <p:stage> <p:stage name="xslt"> <p:input href="someOtherURI"/> <p:param name="stylesheet" href="style.xsl"/> </p:stage> </p:pipeline> This pipeline performs validation, xinclude, and validation then transforms the result. A clever processor could do the last two steps in parallel but it doesn't have to. This pipeline (probably) fails: <p:pipeline> <p:stage name="validate"> <p:output href="someURI"/> </p:stage> <p:stage name="xinclude"/> </p:pipeline> When the XInclude stage begins, there's no anonymous infoset to consume which is probably an error. We could say that the original anonymous infoset is still available, I suppose. That is, that consumption isn't desctructive. I dunno though. Finally, if we allowed recursion, you could do things like this: <p:pipeline> <p:stage name="validate"/> <p:stage name="xslt"> <p:input href="someURI"/> <p:param name="stylesheet"> <p:pipeline> <p:stage name="xinclude"/> <p:stage name="validate"/> </p:pipeline> <p:param> </p:stage> </p:pipeline> This has the somewhat odd consequence of using the anonymous infoset, initially validated, then xincluded and validated again, as the stylesheet instead of the input. Anyway, assuming I haven't overlooked 11 different things, this model seems to result in fairly straightforward pipelines in the simple case, it leverages the common idiom of stdin/stdout, it allows arbitrarily complex pipelines, I think, and it can be optimized both statically and dynamically. Oh, and one last thing, this would also be valid: <p:pipeline> <p:stage name="validate"/> <p:stage name="xslt"> <p:param name="stylesheet" href="style.xsl"/> </p:stage> <p:stage name="fo-processor"> <p:output href="somefile.pdf"/> </p:stage> </p:pipeline> That is, there's nothing that prevents a stage from producing non-XML. Note, however, that it can't do this anonymously. What flows through the pipeline is strictly XML. But there's nothing that prevents stages From writing and reading non-XML from other URIs if they wish. Thoughts? Be seeing you, norm -- Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc. NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Received on Thursday, 16 February 2006 14:49:06 UTC