Re: Requirements Document Updated from Erik Bruchez on 2006-02-10 (public-xml-processing-model-wg@w3.org from February 2006)

From: Erik Bruchez <ebruchez@orbeon.com>
Date: Fri, 10 Feb 2006 23:11:55 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <43ED0FAB.4070901@orbeon.com>
Murray,

I will just use XPL as an example below as we designed it to solve
many of the use cases you now see in the requirement document, and we
have actually been using it for years and written literally tens of
thousands of lines of it. But other group members I am sure have
similar experiences with their own XML pipeline languages:

o XPL only deal with XML infosets, yet it solves many use cases
   (probably all the use cases we have listed so far in this WG,
   although some possibly not optimally). This shows that relying on
   XML infoset is not a showstopper, even though, as you know, we have
   now some inclination for supporting the XDM.

o I don't think we are talking about "stdin" or "stdout" when we are
   talking about component inputs and outputs, but about component
   inputs and outputs connected to other components' outputs and
   inputs, respectively. In other words, this is mainly, if not
   exclusively, an in-pipeline concept.

   In XPL, a component input can be connected to a pipeline input, to
   another component's output, or simply dereference a URL (for
   accessing an XML file on disk, for example). A component output can
   be connected to a pipeline output or to another component's input.

o There are ways of accessing non-XML documents even under this
   scenario. For example, components in XPL that need to read or write
   non-XML data do it on their own. E.g. a "URL component" takes a URL
   as input ("parameter" in this WG terminology, but a configuration
   encapsulated in an XML document in the XPL way), fetches that URL on
   its own (i.e. without the XML pipeline engine knowing anything about
   it), and then produces an XML infoset on its output.

   Similarly, an XSL-FO processor takes an XML infoset as input,
   transforms that into e.g. PDF, and outputs the PDF content somewhere
   (file, HTTP response, or even Base64-encoded data within an XML
   document).

This said I still think we should not lose sight of what we are trying
to accomplish here: defining an XML processing model and language, not
a general-purpose data processing system. I think we should strive for
simplicity while still covering most of the use cases we have
gathered, and too bad if the language doesn't natively support
PostScript of Excel files ;-)

-Erik

Murray Maloney wrote:

 > If I understand the terminology that everybody is using, when we
 > talk about an 'input' it is as though we were talking about stdin,
 > except that we may also be talking about inputs that are specified
 > through command-line options, and even inputs that are consumed in
 > the course of processing, such as a style sheet that may specified
 > within a document or from a user's environment.
 >
 > And when we talk about 'outputs' we are talking principally about
 > stdout, but possibly also artifacts of processing which can be
 > considered as outputs for the purpose of discussion, but are not
 > outputs in the sense that they cannot be 'piped' or re-directed.
 >
 > That leaves us with parameters, which I think of as options on a
 > command line or even environment variables and registry
 > entries. Does anybody attach a different meaning to 'parameters'?
 >
 > These are all familiar paradigms. Surely there is no problem
 > allowing for any and all of these paradigms, from a user's or an
 > implementor's perspective. Am I missing something?
 >
 > What I don't understand is how we can limit 'inputs' or 'outputs' to
 > XML info-sets.  Perhaps we only mean that the XML Processing
 > Specification will provide for steps which operate on a putative XML
 > Info-set. Or perhaps we mean that 'components' must accept/emit
 > reifications of an XML Info-Set at stdin/stdout.  Hopefully we don't
 > mean to prevent a component from emitting non-XML outputs, even to
 > stdout, because that would forbid components from rendering XML into
 > publishable formats such as PostScript et al. Hopefull we also don't
 > mean to prevent a component from accepting non-XML inputs, even
 > stdin, because that would forbid component that read a CSV file and
 > emit an XML rendition of that data as, for example, a table or an
 > address book.
 >
 > If we are saying that the XML Proc will provide mechanisms which
 > allow one to write specifications that operate on an XML Info-Set,
 > then I think that we may have a winner. In that case, my
 > stdin/stdout streams can be anything.  Assuming that my component is
 > capable of taking a CSV file as input, exposes an XML Info-Set
 > interface, and produces PostScript as output, then why should anyone
 > care? Perhaps a component should be required to publish its
 > interfaces so that anyone attempting to use the component will know
 > which interfaces are available.
 >
 > Does any of this make sense, or am I more confused than even I
 > thought?
 >
 > Regards,
 >
 > Murray
Received on Friday, 10 February 2006 22:12:10 UTC