Re: The Scope of Step Names

Jeni Tennison wrote:
> Hi Alex,
> Alex Milowski wrote:
>> Norman Walsh wrote:
>>> / Alex Milowski <> was heard to say:
>>> | Erik Bruchez wrote:
>>> |>
>>> |> Alex Milowski wrote:
>>> |>
>>> |>  > 1. Step must be able to refer to other steps that are
>>> |>  >    siblings (preceding and following) otherwise you
>>> |>  >    can't connected steps at all.
>>> |>
>>> |> "Preceding siblings" would be enough IMO.
>>> |
>>> | I don't think we want to limit to preceding siblings.  If a user
>>> | wants to structure their pipeline "logically" from their perspective,
>>> | such a limitation would get in the way.  I can't see how it is
>>> | any issue for an implementer.
>>> |
>>> | Similarly, if a user can't easily determine "before" or just wants
>>> | to quickly insert a step into their pipeline, they shouldn't have
>>> | to figure out what "preceding sibling" means just to do that.
>>> If we imagine that many (perhaps most) authors will eventually rely
>>> on defaulting at least sometimes, the order of steps will be very
>>> important. I don't see any benefit in saying that sometimes it isn't.
>>> And "before" is pretty easy to determine.
>> I absolute do not thing "before" is easy in all instances.  In addition,
>> considering we have no defaulting story, I don't a "yet-to-be determined
>> defaulting story" should be involved in making this decision.
> Do you have an example of a situation where "before" is not easy (for a 
> user) to define?

Anything that is a meet in the flow graph (e.g. "aggregate documents)
would have the issue that it must be completely after all the steps in
aggregates.  As such, there is a particular spot after which the
user can insert the step.  My believe is that users won't always get
that right and this restriction is completely arbitrary.

> I agree with Erik and Norm that references should only be allowed to 
> preceding siblings (and children of a container).
> First, I don't follow your argument that users will not want to have to 
> locate the place for a step to be inserted. They will have to locate 
> that point in order to change the <p:input> of the next step. For 
> example, if they start with:
>  <p:step name="foo" ...>
>    <p:input port="doc" ...>
>  </p:step>
>  <p:step name="bar" ...>
>    <p:input port="doc" step="foo" source="result" />
>  </p:step>
> and they want to insert a step called 'baz' in between the 'foo' and 
> 'bar' steps, they have to locate the 'bar' step to change the <p:input>. 
> Having located that point, scrolling up two lines in order to insert the 
> step does not seem particularly burdensome.

Yes, with two steps, that is not really a burden.  Try one with 30 or
40 where the inserted step is a aggregation of 10 of the outputs.?

Also, that kind of partial ordering may be a big burden for
auto-generated pipelines where the grade is iterated by vertex and not
by following the flow.

Further, if we allow groups to do what they were intended to do, we'd
have a bigger problem where you have a collection of ancestor steps in
the "flow" that all need the same calculated parameter.  You then
group them into a 'group' step container to calculate the parameter
once as an optimization.  Those steps may not have a transferable
concept of "before" when you cut-n-paste them into the group.  Without
this restriction, I can guarantee you that the resulting pipeline
will be correct.

> Second, I agree that it implementations will have no problems 
> understanding the flow, whatever the order of the steps. But I also 
> agree with Erik that *users* will have a *huge* problem understanding 
> the flow of a pipeline if the components aren't specified in the order 
> that they should execute.

Not necessary.  Some people may prefer to have sections where they
group together like steps together.  We are dictating order when we have 
no technical reason to have a such a restriction.

> Finally, all signs at the moment indicate that we're going to end up 
> with a situation where components can be non-functional and have 
> side-effects. That being the case, the only way we can get any kind of 
> consistent behaviour between implementations is to say something like
>   The result of executing a pipeline must be as if each
>   component were executed once, in the order specified.
> So I think the order of components in the document is very important 
> even without a defaulting story.

I don't follow.  Regardless of the document order, the input/output
references form partial ordering on all the steps and step containers.
That ordering is in no way affected by this.

This restriction feels to me like a statement like:

"You can't have a step called 'transform' because all steps are like a
  transform and that would be confusing to your users if you were allowed
  to do so.  As such, we, the working group, have decided what is best
  for you and disallowed steps being called 'transform'."

--Alex Milowski

Received on Wednesday, 4 October 2006 15:31:27 UTC