Handling of system IDs from Toman_Vojtech@emc.com on 2007-12-03 (public-xml-processing-model-comments@w3.org from December 2007)

From: <Toman_Vojtech@emc.com>
Date: Mon, 3 Dec 2007 04:41:05 -0500
To: <public-xml-processing-model-comments@w3.org>
Message-ID: <6E216CCE0679B5489A61125D0EFEC78708B75371@CORPUSMX10A.corp.emc.com>

Hi all,

I have a question about system ID handling in XProc. The problem I have
is that the system IDs of the documents involved in the pipeline can get
lost while executing the pipeline. Suppose I have the following
pipeline:

<p:pipeline>
  <p:input port="source" sequence="true"/>
  <p:output port="result" sequence="true"/>
  <p:for-each>
    <p:xslt>...<p:xslt>
  </p:for-each>
</p:pipeline>

What I would like to achieve is this:
1. The XProc implementation starts the pipeline and binds (in an
implementation-specific way) the "source" port to a sequence of
documents (book.xml, chapters/ch1.xml, chapters/ch2.xml) 2. The for loop
transforms each of the documents and produces the following documents in
"result": book_processed.xml, chapter/ch1_processed.xml,
chapter/ch2_processed.xml 3. The sequence of documents (with following
system IDs "attached to them": book_processed.xml,
chapter/ch1_processed.xml, chapter/ch2_processed.xml) appears on the
"result" port of the pipeline.

How can I do that? Is this possible at all?

Similar problem occurs when I want to store the generated documents on
file system (using the p:store step):

<p:pipeline>
  <p:input port="source" sequence="true"/>
  <p:output port="result" sequence="true"/>
  <p:for-each>
    <p:xslt>...<p:xslt>
    <p:store>
      <p:option name="href" value="?????" select="?????"/>
    </p:store>
  </p:for-each>
</p:pipeline>

How can I access the document names in the p:store step so I can
generate custom file names (based on the input file names)?

The general problem is, unless I have overlooked something, that after
processing the source documents, the pipeline processor produces a
sequence of XML documents (which are essentially unnamed, with no system
ID) that must be processed by the client application somehow. But the
client application has no clue what these documents are (well, in the
examples above, it can assume that the documents represent a "result of
an XSL tranfromation", but without any further information attached to
them) so it may be hard to know which document is the "main" document,
for instance, or what names to give the output documents (or what folder
structure to use for storing the result documents).

I think the system ID should be passed along with the xml documents
somehow so it can be accessed by the steps in the pipeline. Obviously,
in some (or in most?) cases this would not be possible, for instance
when the pipeline works with "transient" documents created using queries
or viewports. (But even in these cases, for instance when working with a
"viewport" document, it would be nice to be able to access the system ID
of the viewport's source document... - but I can understand this would
not always be possible and it could also introduce lots of ugly issues
in the specification.)

Regards,
Vojtech

--
Vojtech Toman
Principal Software Engineer
EMC Corporation

Aert van Nesstraat 45
3012 CA Rotterdam
The Netherlands

Toman_Vojtech@emc.com

Received on Monday, 3 December 2007 18:28:19 UTC