- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Thu, 03 May 2007 13:32:24 +0100
- To: public-xml-processing-model-wg@w3.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I took an action last week [1] to consider whether there as a way to
integrate the desired functionality of Norm's proposed 'tee' component [2]
more fully into the language.
Consider the first sample pipeline from our spec:
<p:pipeline name="fig1" xmlns:p="http://www.w3.org/2007/03/xproc">
<p:input port="source" sequence="no"/>
<p:input port="schemaDoc" sequence="yes"/>
<p:output port="result" sequence="no"/>
<p:xinclude name="s1">
<p:input port="source">
<p:pipe step="fig1" port="source"/>
</p:input>
</p:xinclude>
<p:validate-xml-schema name="s2">
<p:input port="schema">
<p:pipe step="fig1" port="schemaDoc"/>
</p:input>
</p:validate-xml-schema>
</p:pipeline>
Suppose I want to see the intermediate document, that is, the output
of the xinclude.
Norm's proposal would mean adding the following step in the middle:
<p:tee>
<p:option name="href" value="inter.xml"/>
</p:tee>
[I note in passing that as proposed p:tee doesn't handle document
sequences, and it's not obvious how putting it inside a p:for-each
would help. . .]
[A further note -- seems likely that as defined p:tee would be
sub-optimal inside any kind of iteration (for-each or viewport),
because, presumably, each doc. through the inner pipe would overwrite
the previous one]
So, alternative proposals. . .
1) Since I at least normally think of journalling as something to capture
the _output_ of a step, we could add an optional element to the
content model for steps:
<p:journal port="..." href="..."/>
This would give us, for the sample pipeline above
<p:xinclude name="s1">
<p:input port="source">
<p:pipe step="fig1" port="source"/>
</p:input>
<p:journal port="result" href="inter.xml"/>
</p:xinclude>
The issues wrt sequences still arise, but if we allowed
p:journal at the start of a p:for-each or p:viewport, we could at
least in principle see even what's happening at the beginning:
<p:for-each...>
...
<p:journal port="current" href="inter.xml"/>
or the end
<p:for-each...>
<p:output port="result">
...
</p:output>
...
<p:journal port="result" href="inter.xml"/>
2) Alternatively, we could say that journalling is associated with
pipes, and simply add an optional 'journal' attribute to p:pipe,
e.g.
<p:validate-xml-schema name="s2">
<p:input port="source>
<p:pipe step="s1" port="result" journal="inter.xml"/>
</p:input>
<p:input port="schema">
<p:pipe step="fig1" port="schemaDoc"/>
</p:input>
</p:validate-xml-schema>
As well as adding a p:input, this would require the preceding step to
be named, if it wasn't already.
- ----------
On balance, I prefer (1), because it's lower overhead syntactically.
Whichever way we go, I think we need to bite the sequence and
iteration bullets -- I propose that we say that the semantics of
journalling include the requirement that implementations avoid
over-writing the target if at all possible, at least within a single
pipeline evaluation episode. The way they do this is implementation
defined (and perhaps platform-dependent) -- if they have a versioning
filesystem available, they can use it. Otherwise, a recommended
approach might be to call the first output e.g. inter.xml, the second
inter_2.xml, the third inter_3.xml, and so on. Or we could refer to
the widely available facility of generating unique 'temporary'
filenames with a fixed component. . .
There's an even worse problem which is shared with 'store' -- what if
anything do we say about what happens if multiple pipeline evaluations
are happening at the same time?
An alternative approach would be to document p:for-each and p:viewport
as always binding a parameter/option whose name is p:i_[stepname] to
the index of the document passing through their subpipe, and
furthermore specifying that the 'href' attribute of p:journal is
treated as an attribute value template. Then you could write e.g.
<p:journal port="current" href="inter_{$p:i_chapters}.xml"/>
Having such a binding convention might be a good idea in any case.
ht
[1] http://www.w3.org/2007/04/26-xproc-minutes.html#action01
[2] http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2007Apr/0138.html
- --
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
Half-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
iD8DBQFGOdZYkjnJixAXWBoRAm4kAJ9iX2TerrXa2GH0GgVDt7rVt22EFQCcDWoz
3EoLhUz48Ir60A37PZZ4tag=
=sYXo
-----END PGP SIGNATURE-----
Received on Thursday, 3 May 2007 12:32:26 UTC