- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Mon, 08 May 2006 09:55:28 +0100
- To: public-xml-processing-model-wg@w3.org
Hi, Norm Walsh wrote: > On the whole, I favor the more explicit options, but maybe there's > something to be said for implicit names. Implicit steps and implicit > I/O require less typing but they require the reader to know even more > to make sense of the pipeline. I think there are two sets of syntax-level options we have to explore: 1. directed vs. generic syntax (e.g. implicit/explicit steps) 2. defaulting attributes, elements, steps (e.g. input/output names) This mail discusses the first. To summarise, I'd be happy with either syntax as long as elements (rather than attributes) were used to represent inputs/outputs/parameters. I also hope we don't end up with a situation there are options such that different users or different components use different syntax. There's a real continuum between directed and generic syntax. From completely generic syntax: <p:step name="p:xslt"> <p:input name="source" ref="..."/> <p:input name="stylesheet" ref="..."/> <p:output name="result"/> </p:step> through using the name of the element to indicate the component: <p:xslt> <p:input name="source" ref="..."/> <p:input name="stylesheet" ref="..."/> <p:output name="result"/> </p:xslt> through using directed syntax elements for the inputs/outputs/parameters: <p:xslt> <p:source ref="..."/> <p:stylesheet ref="..."/> <p:result/> </p:xslt> to using directed syntax attributes for the inputs/outputs/parameters: <p:xslt source="..." stylesheet="..."/> There's obviously also the possibility of allowing more than one of these syntaxes (in the way that RDF does), but I think we should avoid that if at all possible. There's also the possibility of having different components have different XML structures for their steps. Again, I think we should try to avoid that if we can, as it would raise the barrier on learning to put together pipelines. I find it very hard to decide between generic and directed syntax. I've tried to look at these options from the standpoint of the usual criteria I try to apply when designing markup languages: 0. information capture 1. human understandability 2. ease of processing 3. maintainability/extensibility 4. size Fundamentally, the syntax needs to be able to do what we need it to be able to do. For example, I think we will want to allow users to embed documents within the input definitions, e.g. <p:xslt> <p:source href="document.xml" /> <p:stylesheet> <xsl:stylesheet ...> ... </xsl:stylesheet> </p:stylesheet> <p:result /> </p:xslt> We may also want to provide other meta-information about the inputs and outputs (such as the schemas they comply with), which would be impossible if they were represented by attributes. I think that rules out using direct syntax attributes for the inputs/outputs/parameters, but it doesn't rule out using directed syntax child elements. The more generic options are obviously longer and arguably less immediately understandable. (I think it's easier to understand the plumbing behind the step -- what's the component, what are the inputs and outputs -- but harder to grok what the step is actually doing in the pipeline.) I'm imagining that users will want to write their pipeline definitions by hand in their favourite XML-editing software, and to use schemas to provide auto-completion. Directed syntax is better for schema validation because it's a lot easier to hang content models off element names than off attribute values (the old thing of XML Schema not supporting co-occurrence constraints). In a directed syntax, the schema declaration for the p:xslt element would mean users could get prompted for an input and a stylesheet rather than having to look up the component definition to work out what names have been given to the inputs, parameters and outputs. On the other hand, I'm also imagining that pipeline engines will also make available their own, implementation-defined, components as well as the standard ones that we define. Put these assumptions together, and if we had a directed syntax we would have a situation where every pipeline engine would effectively use a different schema (because they support different components). I think that would rapidly become difficult for users to handle. [Note: I know that XSLT extension elements are essentially the same thing and XSLT users have been able to manage. One big difference is that XSLT is inherently un-validatable using most schema technologies, so XML editors have built-in auto-completion assistance rather than using schemas. We want XProc to be validatable using XML Schema & RELAX NG. Another is that XSLT extension elements are relatively thin on the ground compared to the built-in XSLT elements, and I'm not sure whether we're going to have a similar situation here: what's the proportion of built-in components vs. engine-specific components going to be?] In addition, I'm imagining that users will want to write their own reusable pipelines which they reference as components. I think it will prove difficult for those pipelines to be referenced using directed syntax. We could end up with a definition like: <p:pipeline name="my:process"> ... </p:pipeline> in the same file as a reference like: <my:process ... /> Schema validation goes out the window. Of course, we could treat user-defined pipelines as different from other components, and use something like: <p:call-pipeline name="my:process"> ... </p:call-pipeline> but if we're going to have that kind of generic syntax for user-defined pipeline components, perhaps it be simpler to unify and use it for all components. Another downside of directed elements is that they make it harder to define a language that can be extended with documentation/test cases/engine-specific annotations etc. Rules like "if it's not in the XProc namespace, the pipeline engine can ignore it" is easier to understand than having (as XSLT does) attributes to indicate which namespaces actually should be understood by the pipeline engine. As I say, I'm undecided. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Received on Monday, 8 May 2006 08:55:47 UTC