Re: Replacing hasDataFrom with hasDataTo from Daniel Elenius on 2004-11-27 (public-sws-ig@w3.org from November 2004)

From: Daniel Elenius <elenius@csl.sri.com>
Date: Fri, 26 Nov 2004 22:28:02 -0800
To: Drew McDermott <drew.mcdermott@yale.edu>, public-sws-ig@w3.org
Message-ID: <41A81E72.7040908@csl.sri.com>
Drew McDermott wrote:

>>Some further thoughts about data flow declarations.
>>
>>In a previous post, I suggested removing the Produce construct, and 
>>instead adding the producedBinding to the Perform class. Regardless of 
>>whether we do this or not, there is still a disturbing assymetry -- Some 
>>(most) data flow declarations are declared at the target of the data 
>>flow (using hasDataFrom), but some are declared at the source (using 
>>producedBinding).
>>    
>>
>
>It's not asymmetrical.  We always declare where a datum comes from,
>not where it goes.  It's true that in some subjective sense some
>dataflow declarations are "closer to" the variable declarations for
>their targets, but they're not exactly "at" those declarations.
>  
>
hasDataFrom is a property of the data target, and producedBinding a 
property of the data source. I don't see how you can claim that they are 
not assymmetrical. It's a mixed push/pull model. hasDataFrom is pull, 
producedBinding is push, no?

BTW, I just noticed something else that looks like a bug. 
producedBinding takes an OutputProperty, which in turn has a toParam and 
a valueSource/valueFunction/etc. The toParam is where the data from the 
Produce goes, right? And the toParam takes an Output. So, we can give an 
Output where the data goes. But we can't say which Perform it goes to, 
so if we have several different Performs using the same Process, we have 
an ambiguos description. We don't know where to send the data.

>One way or another you have to indicate that the flow depends on which
>branch of a conditional execution takes.  It seems simplest to do it
>on the branch itself.  
>
Yes.

>Note that a Produce can appear anywhere on that
>branch.  As I've indicated before, it's not tied to where the value is
>first available.  
>
>  
>
This could still be done by having the producedBinding at a 
ControlConstruct. This still allows all the flexibility you have with 
Produce. For example, lets say we have a composite process which has an 
If-Then-Else. The then construct is just a Perform, and the else 
construct is a Sequence of two Performs. We want to produce a data 
binding to an output of the composite process from the else branch. We 
can choose to put the producedBinding on the Sequence, on the first 
Perform of the Sequence, or on the second Perform of the Sequence.

 

>>Why don't we change this to always declare data flow bindings at the 
>>source rather than at the target? We could replace hasDataFrom with 
>>hasDataTo (or some better name), which would refer to
>>a) a parameter of the current perform (an Input if the current perform 
>>is a composite process binding inputs to its child processes, otherwise 
>>an output) as the source for the data flow, and
>>b) a ValueOf  referring to one of the children performs, and one of its 
>>Inputs (or Outputs if it is a binding to an output on the parent 
>>composite process) as the target for the data flow.
>>
>>This has several advantages:
>>    
>>
>
>We discussed the alternatives to a pure "consumer-pull" notation and
>rejected them.  I wish we'd written our reasons down!
>
>  
>
Me too!

>>1) We no longer need the Produce construct.
>>    
>>
>
>I don't see how you avoid it.  _Something_ has to do the job it does.
>
>  
>
See above.

>>2) We no longer need to declare some data flow declarations at the 
>>source and some at the target.
>>    
>>
>
>See above.
>
>  
>
I didn't understand your answer above.

>>3) It seems more natural -- an OWL-S execution engine would know what to 
>>do with outputs as soon as it encounters them, since the data flow 
>>declaration for a binding is in the same perform as where the output is 
>>generated.
>>    
>>
>
>As I've indicated in prior messages, the values going to outputs are
>_arbitrary_ expressions.  This is the biggest obstacle, I think, to a
>"source-push" notation: Different pieces of the source are in general
>pushed from more than one place.
>
>Also, an execution engine is not constrained to discovering how a
>process works by uncovering pieces of it one by one.  I would expect
>it to look at all the annotations and see how the data flow before it
>started doing anything else.  
>
>Actually, my Opt system does use a notation that is a bit closer to
>"source-push" than "consumer-pull."  (See the Opt Manual
>(http://www.cs.yale.edu/homes/dvm/papers/opt-manual.pdf), section 7.)
>Opt has the concept of a "link," which is a communication device
>between different parts of a plan.  A step can "push" an output into a
>link.  A later step can use that link as an input.  There is a special
>link 'result' for the output of the current plan.  In the case where
>an expression is more complex than a single step output, one needs a
>special 'collect-value' step which would access all the links
>containing the pieces of the expression, compute it, and stick the
>result into the link that needs it.  'collect-value' plays the role of
>'produce'.
>
>  
>
That sounds good to me.

>A link is almost the same as a variable, except that it can be set
>only once.  One issue that we should keep in the back of our mind is
>how we generalize all this to handling loops.  Rather than have a link
>get a new value every time a plan iterates, I think we ought to steal
>an idea from the Clean functional-programming language and have links
>hold a stream of values which gets added to on every iteration.
>
>  
>
I'm not sure I understood that last point.

>Well, this message is way too long.  Sorry.  It gets worse.
>
>  
>
No problem at all :)

>>4) The withOutput in the Result of a composite process is used in some 
>>of the existing OWL-S examples (BravoAirProcess.owl and 
>>CongoProcess.owl) to declare data flow from children performs to a 
>>parent composite process. However, doing this inside the Result bundle 
>>does not strike me as a natural place to do this. Doing this means that, 
>>if several Results of the same composite process need the same data flow 
>>binding, they all have to declare it. This is also no longer necessary 
>>if we declare data flow at the data source, as I suggest. 
>>    
>>
>
>'Produce' also makes it unnecessary.
>
>  
>
>>The Result 
>>bundles could then just refer to the outputs of the composite process, 
>>since any data would have to come from there anyway. In other words, 
>>data flow is decoupled from Results, but Results can still refer to 
>>data, and relate outputs to their associated Effects.
>>    
>>
>
>Yep, that's the way it works now.
>
>Remember that BravoAir and CongoBuy were written a while ago.
>Translating them into the surface notation is an enlightening
>exercise.  Below is my first pass through BravoAir, with my note in at
>least one place that seems screwed up to me, partly because the output
>parameters are set in a Result of a composite process:
>
>  
>
This is very enlightening!

[snip]

>Actually, there are a lot of other problems in this example, including
>the fact that at several points it uses the name of a process as the
>name of a _perform_ of that process.  Some of those points I fixed by
>tossing tags in, and others I didn't.  I should probably finish this
>and get it right.  (Then translate it automatically back into XML/RDF
>using my parser.  Then have it validated.  Oh well, too much work.)
>
>  
>
Yes, that would be nice :)

>                                             -- Drew
>
>
>--
>[To unsubscribe to this list send an email to "majdart@bbn.com"
>with the following text in the BODY of the message "unsubscribe daml-process"]
>  
>
Received on Saturday, 27 November 2004 06:28:08 UTC