RE: Comments on Editor's Draft 9 January 2008 from Toman_Vojtech@emc.com on 2008-02-11 (public-xml-processing-model-comments@w3.org from February 2008)

From: <Toman_Vojtech@emc.com>
Date: Mon, 11 Feb 2008 03:37:39 -0500
To: <public-xml-processing-model-comments@w3.org>
Message-ID: <6E216CCE0679B5489A61125D0EFEC78709BD4677@CORPUSMX10A.corp.emc.com>
> / Toman_Vojtech@emc.com was heard to say:
> |> / Toman_Vojtech@emc.com was heard to say:
> |> | 1. Section 4.7.1 (options). From the schema (rng) it 
> looks like the 
> |> | shortcut form can only be used for atomic steps and "other"
> |> compound
> |> | steps. Why isn't it possible to use the shortcut form also
> |> on built-in
> |> | compound steps (such as for-each or group) which can
> |> specify options
> |> | in the "long" form?
> |> 
> |> I think the shortcut form only makes sense on atomic steps where 
> |> there's a declaration for the option. On compound steps, 
> allowing the 
> |> shortcut form would be both a declaration and a binding and so 
> |> there'd be no way to tell if there was a typo or something.
> |
> | But isn't it the same also with the long option form? I mean, it is 
> | also a declaration and a binding.
> 
> Yes. We could make
> 
>   <p:group px:some-random-attribute="value" ...>
> 
> be the same as
> 
>   <p:group ...>
>     <p:option name="px:some-random-attribute" value="value"/>
> 
> but it would never be possible to detect errors because 
> there's no declaration against which to compare the short 
> form. Given that we don't think options on compound steps are 
> going to be very common, we're being conservative and not 
> allowing the short form.
> 
> |> | 5.Section 4.1 (p:pipeline): "All p:pipeline pipelines have
> |> an implicit
> |> | primary input port named "source' and an implicit primary
> |> output port
> |> | named "result". Any input or output ports that the
> |> p:pipeline declares
> |> | explicitly are in addition to those ports and may not be 
> declared 
> |> | primary."
> |> |
> |> | So, is it allowed to explicitly specify the implicit 
> input/output 
> |> | ports inside p:pipeline? If so, is it possible to redefine their 
> |> | properties (primary, sequence)? Is the following permitted?
> |> |
> |> | <p:pipeline>
> |> |   <p:input port="source" sequence="false"/>
> |> |   <p:output port="result" primary="false"/>
> |> |   <p:output port="result2" primary="true"/>
> |> |   ...
> |> | </p:pipeline>
> |> 
> |> No. The implicit declarations of source/result cannot be 
> repeated or 
> |> changed. Of course, you can use p:declare-step if you want to have 
> |> different values.
> |
> | Now I am really confused. Does the specification mention this 
> | possibility (declaring a pipeline with different primary 
> input/output 
> | properties)?
> 
> I think so. The intent is that if you use the p:declare-step 
> form, then you can declare any inputs and outputs you like 
> for a pipeline. If you use the shortcut form, p:pipeline, 
> then you get exactly one input and exactly one output. You 
> can add additional non-primary inputs and outputs, but you 
> can't change the primary "source" and "result".
> 
> Maybe it would be cleaner if we didn't allow you to add others.
> 
> Cleanest of all would be to remove the p:pipeline altogether, 
> but I think that would be awkward.
> 
> | I also thought it was not possible to declare different names than 
> | "source" and "result" for primary pipeline input/output ports in 
> | p:declare-step (the "source" and "result" strings seem to be quite 
> | hard-coded in section 4.1 - p:pipeline), but after reading section 
> | 5.8.2 (declaring pipelines) again, I am not that sure any more. It 
> | looks to me now that I am free to declare any primary input/output 
> | ports on a pipeline...
> 
> If you use the declare-step form, you're free to do anything you like.
> 
> The constraints on source/result only apply to the syntactic 
> sugar "p:pipeline" form.
> 
> Is that any clearer?


Yes, it is. Thanks for responding to this. I guess I am fine with the
way how it works now, except there is still one thing I am not that
happy with: Because the pipeline must have a "main" (or top-level)
p:pipeline, the specification is effectively forcing the pipeline
author/user to use a certain "interface" (the "source" and "result"
ports, plus maybe some additional ports). Internally, you can declare
any sub-pipelines you want, but on the top level, you don't have this
freedom.

While I can see the reasoning behind fixing the "source" and "result"
ports, I think it can makes certain things difficult to achieve (unless
wou want to use ugly workarounds in your pipelines). I understand that
the most common use case is: take XML document(s) - do something -
output result XML document(s), but there may be other (even though
possibly weird) scenarios, such as the following:

Pipeline that takes one XML document on its input and saves it using the
p:store step (the pipeline has no primary output):

<p:pipeline>

  <!-- TODO: add some code that checks that
       we got exactly one XML document -->
  <p:store/>

  <!-- p:store has no primary output, so
       make sure the "result" output port
       is bound to something... -->
  <p:identity>
    <p:input port="source">
      <p:empty/>
    </p:input>
  </p:identity>
</p:pipeline>

Notice that I still have to provide some data to the "result" output
port of the pipeline, even though the result data is not important at
all for the calling application.


Regards,
Vojtech

--
Vojtech Toman
Principal Software Engineer
EMC Corporation

Aert van Nesstraat 45
3012 CA Rotterdam
The Netherlands

Toman_Vojtech@emc.com
Received on Monday, 11 February 2008 08:33:41 UTC