Media type for pipeline docs and names for pipeline elts

Hash: SHA1

Further to an action from last week [1] here's some discussion on this

Wrt media type, we have essentially two choices:
 1) Recommend that pipelines be served as application/xml;
 2) Register a new hybrid media type application/xproc+xml and
    recommend that _that_ be used.
Option (2) strongly implies that we should recommend a file extension
for pipeline docs _other_ than .xml, so that server configuration is

Pros and cons

(1) is simpler, in that we don't have to _do_ anything (except
considerably trim Appendix C [2]).  It's less flexible, in that it
only allows shorthand (id-based) pointers, which in turn require
pipeline authors to have actually _used_ xml:id, or XPointer
element-scheme (tumblers, as they used to be called, e.g. for
the p:validate-wit-xml-schema step in the first example pipeline of
our spec.

(2) takes a bit more work, in that we have to do the IETF registration
dance, but that's a well-trodden path, and Appendix C [2] already has
the necessary shape.  It gives universal access to all steps (and
containers, if we do the necessary work to fix up default naming in
line with the revised atomic/compound/multi-container ontology).

My real worry about (2) is that it seems to me very unlikely that any
generic XML processor will ever _implement_ it.  Pointing into XProc
documents as a special thing to do is going to be very rare, and
therefore very unlikely to get client-side support from any generic
tools.  And XProc-specific tools have no obvious _need_ for pointers
into pipeline documents.  If this analysis is correct, it would be
misleading to go with (2) and convey the implication that users would
actually get value from using the kind of pointers it suggests.

So, my personal preference is to fall back to (1), bearing in mind
that we can always register an XPointer scheme, e.g. xproc(), and
define its contents to be similar to what we have in C.2 today, if it
turns out there is a real application and demand. . .

Having said that, section 2.1.1. _Step names_ [3] gives three reasons
for assigning names to every step, and choosing option (1) only
removes one of them.  So I think it should stay, but be revised as
follows to take account of the new ontology and make it more robust:

  If the pipeline author does not provide an explicit name for any
  step or non-step wrapper, the processor manufactures a default
  name. All default names are of the form *!1.m.n...* where *m* is the
  position of the step's highest ancestor within the pipeline document
  or library which contains it, *n* is the position of the
  next-highest ancestor, and so on, including both steps and non-step

  For example, consider the pipeline in Example 3, *A validate and
  transform pipeline*. The p:pipeline step has no name, so it gets the
  default name *!1*; the p:choose gets the name *!1.2*; the first
  p:when gets the name *!1.2.1*, etc. If the p:choose had had a name,
  it would not have received a default name, but its first p:when
  would still be named *!1.2.1*.

I prefer this nesting approach to the existing linear one because it
is a bit more robust wrt minor changes in a pipeline.  It can easily
be figured out by using any kind of a tree widget that expands and
hides subtrees, but it's a separable part of this discussion.


- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail:
[mail really from me _always_ has this .sig -- mail without it is forged spam]
Version: GnuPG v1.2.6 (GNU/Linux)


Received on Thursday, 27 March 2008 11:48:46 UTC