Re: Handling of system IDs from Norman Walsh on 2008-02-08 (public-xml-processing-model-comments@w3.org from February 2008)

From: Norman Walsh <ndw@nwalsh.com>
Date: Fri, 08 Feb 2008 11:45:44 -0500
To: public-xml-processing-model-comments@w3.org
Message-ID: <m2zlubpfbb.fsf@nwalsh.com>
At the 20 Dec 2007 telcon, we discussed this issue:

/ Toman_Vojtech@emc.com was heard to say:
| Hi all,
|
| I have a question about system ID handling in XProc. The problem I have
| is that the system IDs of the documents involved in the pipeline can get
| lost while executing the pipeline. Suppose I have the following
| pipeline:
|
| <p:pipeline>
|   <p:input port="source" sequence="true"/>
|   <p:output port="result" sequence="true"/>
|   <p:for-each>
|     <p:xslt>...<p:xslt>
|   </p:for-each>
| </p:pipeline>
|
| What I would like to achieve is this:
| 1. The XProc implementation starts the pipeline and binds (in an
| implementation-specific way) the "source" port to a sequence of
| documents (book.xml, chapters/ch1.xml, chapters/ch2.xml) 2. The for loop
| transforms each of the documents and produces the following documents in
| "result": book_processed.xml, chapter/ch1_processed.xml,
| chapter/ch2_processed.xml 3. The sequence of documents (with following
| system IDs "attached to them": book_processed.xml,
| chapter/ch1_processed.xml, chapter/ch2_processed.xml) appears on the
| "result" port of the pipeline.
|
| How can I do that? Is this possible at all?
|
| Similar problem occurs when I want to store the generated documents on
| file system (using the p:store step):
|
| <p:pipeline>
|   <p:input port="source" sequence="true"/>
|   <p:output port="result" sequence="true"/>
|   <p:for-each>
|     <p:xslt>...<p:xslt>
|     <p:store>
|       <p:option name="href" value="?????" select="?????"/>
|     </p:store>
|   </p:for-each>
| </p:pipeline>
|
| How can I access the document names in the p:store step so I can
| generate custom file names (based on the input file names)?
|
| The general problem is, unless I have overlooked something, that after
| processing the source documents, the pipeline processor produces a
| sequence of XML documents (which are essentially unnamed, with no system
| ID) that must be processed by the client application somehow. But the
| client application has no clue what these documents are (well, in the
| examples above, it can assume that the documents represent a "result of
| an XSL tranfromation", but without any further information attached to
| them) so it may be hard to know which document is the "main" document,
| for instance, or what names to give the output documents (or what folder
| structure to use for storing the result documents).
|
| I think the system ID should be passed along with the xml documents
| somehow so it can be accessed by the steps in the pipeline. Obviously,
| in some (or in most?) cases this would not be possible, for instance
| when the pipeline works with "transient" documents created using queries
| or viewports. (But even in these cases, for instance when working with a
| "viewport" document, it would be nice to be able to access the system ID
| of the viewport's source document... - but I can understand this would
| not always be possible and it could also introduce lots of ugly issues
| in the specification.)

And concluded that it needed more thought:

  83. Handling of system IDs

   Norm: I think this is really about base URIs, which I'd thought we'd
   worked through, but now I'm not so sure.
   ... It might make sense to augment p:store so that it uses the xml:base
   value of the document element if it has one.

   Some discussion of how doctype declarations established by the
   serialization in XSLT might come into play.

   Henry: I was chasing this the other day and discovering that Saxon does
   better than most other processors about changing the base URI if you add
   an xml:base attribute.
   ... We did say that preserving base URI properties was something you
   should do, but we didn't directly address this point.

   Norm: Toman wants to change the base URI in a specific way.

   Richard: What's more, he probably wants it in unabsolutized form.

   Norm: Maybe we need to think about this some more...

   Richard: The clunky way to fix this is to start with a document that lists
   the documents rather than directly with the list of documents.

   Norm: Should we leave this one open a bit and see what progress we make.

   Let's take this one to email.

   Richard: The p:for-each step already does some things to the XPath
   environment, it could put something in which was the URI of the document
   in question.

   Some more discussion of the examples

   Richard observes that the select in p:store could access the extension
   function

The chair humbly solicits...more thought :-)

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Some people will never learn anything,
http://nwalsh.com/            | for this reason, because they
                              | understand everything too soon.-- Pope
Received on Friday, 8 February 2008 16:46:00 UTC