Re: URIs as inputs and outputs from Jeni Tennison on 2006-04-13 (public-xml-processing-model-wg@w3.org from April 2006)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Thu, 13 Apr 2006 14:01:05 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <443E4B91.2030008@jenitennison.com>
Hi Alex,

Alessandro Vernet wrote:
> On 4/5/06, Jeni Tennison <jeni@jenitennison.com> wrote:
> I look positively at ideas that try to unify concepts, but in this
> case I have a hard time understanding how we can unify the concept of
> document available at a certain URI and a named sequence of document
> produced by a processor. Considering your example:
> 
>>    <p:step use="xslt2.0">
>>      <p:input name="source" href="a.xml" />
>>      <p:input name="stylesheet" href="b.xsl" />
>>      <p:output name="result" href="c.xml" />
>>    </p:step>
> 
> Just looking at this portion, is the input href="a.xml" a reference to
> a file in the same directory of our pipeline, or is it a reference to
> a <p:output name="..." href="a.xml" /> previously defined in the
> pipeline? How do we make the difference?

It's a reference to the document that the pipeline processor has 
associated with the URI "a.xml". This is the principle behind the idea 
of "pipeline engine as resource manager", as I understand it.

This follows the way that XSLT 2.0 and XQuery deal with the fact that 
the documents they access have to be stable: if a stylesheet calls the 
doc() function on the same URI twice, then the two documents returned 
have to be identical. XPath 2.0 defines available documents as part of 
the dynamic context:

   [Definition: Available documents. This is a mapping of strings onto
   document nodes. The string represents the absolute URI of a resource.
   The document node is the root of a tree that represents that resource
   using the data model. The document node is returned by the fn:doc
   function when applied to that URI.] The set of available documents is
   not limited to the set of statically known documents, and it may be
   empty.

The doc() function looks up the URI it's given as an argument within the 
available documents mapping and returns the document that's associated 
with that URI.

Most XSLT and XQuery processors will include all possible URLs within 
the accessible documents mapping: given a URL, they'll attempt to access 
the document using the relevant protocol. But they will also have a 
facility to allow the person running the processor to define other 
URI-to-document mappings; for example, Saxon allows you to supply a 
URIResolver that does the mapping for you.

Similarly, XSLT 2.0 doesn't actually write documents to particular URIs: 
it's up to the processor what gets done with the result documents. Sure, 
the usual course of action is that the documents are written to their 
base URIs, but this isn't mandated (and in some environments, such as in 
a browser, it would cause problems).

Keeping this separation between a URI/document map and the physical 
reading/writing of a file to disk makes it a lot easier to avoid 
side-effects.

> I have to say that I feel more comfortable if we make the distinction
> between a reference to a label defined previously in the pipeline and
> a URI:
> 
> <p:input name="source" href="a.xml" />
> <p:input name="source" label="a" />

I think that you're saying that you'd prefer that there was a 
distinction between those documents that were purely intermediate (and 
hence only labelled) and those that were written to disk (and hence 
given a URI).

I like the idea of using URIs as labels for documents and for 
collections of documents, and am comfortable with the idea that this 
doesn't imply the documents are written to those URIs. I guess it's just 
a very familiar concept for me because of the XSLT 2.0 use.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Thursday, 13 April 2006 13:01:14 UTC