Re: [OWL-S]: simple process coordination from Bijan Parsia on 2003-10-07 (www-ws@w3.org from October 2003)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Tue, 7 Oct 2003 05:52:50 -0400
To: David Martin <martin@AI.SRI.COM>
Cc: "'www-ws@w3.org'" <www-ws@w3.org>
Message-Id: <032778DA-F8AC-11D7-994E-0003936A0B26@isr.umd.edu>
On Tuesday, September 23, 2003, at 09:11 PM, David Martin wrote:

> OK, let's press forward on this issue of representing process 
> coordination.  Let's concretize the discussion using a very simple 
> interaction (part of Monika's scenario; see "Model of Concurrency in 
> DAML-S"):
>
> Suppose we want to specify 2 composite processes, A and B.  Let's say 
> they start at about the same time and execute concurrently for awhile, 
> and then at some point A needs to get some information from B.  I'll 
> assume that this is done by having A invoke some atomic subprocess of 
> B which I'll call, say, B2.  My goal here is just to get clear about 
> how we want to specify this basic interaction between A and B.
>
> Let's also stipulate that B2 has an input called B2-in1, and an output 
> called B2-out1, which are the I/O that A cares about.  (But these 
> aren't necessarily the only inputs/outputs that B2 has.)
>
> Currently, in DAML-S/OWL-S, we are relying on our dataflow constructs 
> to indicate the relationship between (some part of) A and B2.  My 
> immediate goal is to get clear about the details of what appears in 
> the spec of process A with respect to this relationship.  I can 
> imagine 2 answers; in one, B2 is actually mentioned in A, in the 
> other, it isn't.  Let's look at each in turn.
>
> Note: for simplicity, let's assume that composite process A will run 
> in a single enactment engine, and composite process B will also run in 
> a (different) single enactment engine.

I mentioned before that this is somewhat abiguous, in that, presumably, 
process A is running it engineA *only in the sense that* all 
sychronization and data routing is through engine A. (Acutally, as I 
wrote in my other message, "engine" is ambiguous between, in Erlang 
terms, a process and a node) That is, I'm assuming that each 
AtomicProcess is a Web Service and that Web Service could be living on 
the same machine as engine A, but typically is not. Does this make a 
difference to your example? If we *can* bind all the AtomicProcess 
invocations to intraengine "procedure invocations" then I think we're 
done already, right?

>   Let's also assume that process A is described in namespaceA, and 
> process B in namespaceB.
>
> Answer 1
> --------
>     The spec of process A mentions B2 (let's say, as part of a 
> sequence). (I'll show this very informally, and I will not bother to 
> show the definitions of the atomic processes.)
>     Process A:
>     <Sequence>
>         <AtomicProcess rdf:about=#A1>
>         <AtomicProcess rdf:about=namespaceB:B2>
>         <AtomicProcess rdf:about=#A3>
>     </Sequence>

If they *really* have to sync, then I would imagine that a series of 
spilts and joins would be the more standard (and more complicated) 
approach. Or rather, *isn't* that the approach for modeling a pair of 
communicating processes (at least explicitly?)

> Then, we use a dataflow declaration to indicate that some output of A1 
> flows into B2-in1, and B2-out1 flows into some input of A3:
>
>     namespaceA:A1-out1 => namespaceB:B2-in1
>     namespaceB:B2-out1 => namespaceA:A3-in1
>
> (I'm just giving an indication of the dataflow here, not trying to 
> reproduce all the details of our current approach.)
>
> This seems clear enough, I guess.  The idea is that these dataflow 
> relationships imply the obvious things about timing.

Or you get a runtime failure. We don't yet specified how dataflow works 
in this case.

>  Such as, B2 must wait until it receives its input (from A1) before it 
> executes, and similarly, A3 must wait until B2 has completed.

But this seems a reasonable way to do it to me.

> I assume that B2 will also appear in the process spec of B, something 
> like this:
>     Process B:
>     <Sequence>
>         <AtomicProcess rdf:about=#B1>
>         <AtomicProcess rdf:about=#B2>
>         <AtomicProcess rdf:about=#B3>
>     </Sequence>
> (Let's ignore for the moment what's going on with the inputs and 
> outputs of B1 and B3.)
>
> So what bothers me about this?  Well, there are several things, but 
> here's what bothers me the most:
>
> Complaint 1. It seems ambiguous as to whether B2 is supposed to be 
> executed once or twice - even after looking at the dataflow.

Presumably, this #B2 is a process occurance? Thus there is only one of 
them, and thus (absent a loop) only one invocation.

That's no as satisfying as one might like.

> What I have in mind is that it will only execute once as part of the 
> execution of process B.  But couldn't one also interpret this as 
> running process B2 *both* as part of process A and as part of process 
> B?

I assume that it wouldn't be that way if you used a split join style 
represenation. Hmm. There's some sense of there being a "host" process, 
something beyond the merelogical sum of the atomicprocesses. We 
sometimes call that the "execution engine". But this means that a 
composite process has an "executabilty" or "invocability" beyond that 
of it's atomicprocesses. Do we want to go there? That means that 
CompositeProcesses can't be choreographies (I think).

> The (hand-waving) answer I think I've been hearing the most is that 
> the grounding makes this sort of thing clear.  OK, I *think* we could 
> get that to work out, by grounding B2 to a solicit/response operation 
> in process A, and to a request/response operation in process B - but 
> then we are really talking about 2 *different* atomic processes,

Hmm. Perhaps? That seems to map into at least some of your intutitions 
about how to represent things above. I'll confess that I didn't find 
the occurance of B2 in A to have been the obvious choice for how to 
model the situtation, though I can see the motivation.

>  so they'd have to be named and declared differently.  From the WSDL 
> 1.1 point of view, I suppose, that's fine, since it maps nicely - but 
> from my OWL-S point of view, it seems really unfortunate to have to 
> declare a distinct atomic process to represent what is easily thought 
> of as an "invocation statement" within Process A.

I *think* that there's one message sequence (which message is in or out 
depends on your perspective).

>   Another issue, about which I'm not clear, is: can we still adopt 
> this approach with WSDL 1.2?  And, I suspect there are other 
> gaps/problems that none of us has yet grappled with.
>
> Note that even if we ignore the "ambiguity" concern, or find a 
> different solution for it, we still have to come to grips with the 
> grounding issues.

This we agree on.

> Answer 2
> --------
>     The spec of process A doesn't mention B2:
>     Process A:
>     <Sequence>
>         <AtomicProcess rdf:about=#A1>
>         <AtomicProcess rdf:about=#A3>
>     </Sequence>
>
> and, again, we use a dataflow declaration, just as above, to indicate 
> that some output of A1 flows into B2-in1, and B2-out1 flows into some 
> input of A3:
>
>     namespaceA:A1-out1 => namespaceB:B2-in1
>     namespaceB:B2-out1 => namespaceA:A3-in1
>
> In this approach, the process spec of B remains the same as above (in 
> Answer 1).

I guess this is my prefered approach. There's still something weird 
about it, perhaps.

> OK, this solves the "ambiguity" concern I mentioned above.  And, 
> offhand, I can't see that there are any representational gaps here.  
> So why do I hate it so much???
>
> Complaint 2: It must be because it (the control flow) is so 
> ridiculously - and unnecessarily - hard to read and think about!

I, obviously, in some sense, don't agree. There is an oddness that in 
dataflow langauges I'm familar with (Prograph, mostly) the control 
effects of dataflow are more central (i.e., you wouldn't bother having 
sequence, because you can mostly get that from dataflow; if you need 
explicit sequencing,g it's because you don't *want* something to 
execute before something else (due to side effect issues, say) even if 
it has all it's inputs filled; but that's the uncommon case).

I guess it falls on me to do the split-join version as well. In 
extremely fake syntax:

<split-join>
	<split>
		<A1>
		<B1>
	</split>
	<join>
		<sequence>
			<b2>
			<split>
				<A3>
				<sequence>
					<B2>
					<B3>
				<sequence>
			</split>
		</sequence>
	</join>
</split-join>

I don't imagine anyone will care for this :) Though I'd argue it 
expresses (pace errors) the example, and in a reasonable way. Dataflow 
would remain untouched.

If we give up on dataflow determining control flow (in some cases) then 
I think we should considering giving that up altogether. (I'm 
suspecting that I introduced that by my dataflow inspired reading of 
what might have been merely intended as a parameter binding mechanism). 
Similarly, if we are going to have several new constructs for 
representing communicating composite processes, should we still have 
split-join?

> ------- Conclusion
>
> These are the kinds of considerations that lead me to want to have 
> constructs for "invoke" and (let's call it) "accept".  (In the 
> example, process A "invokes B2", and process B "accepts" the 
> invocation, and that could easily be made explicit.)  Why shouldn't 
> these distinctions be captured in control flow, as well as in > dataflow?

If redundantly so, then I'm sort of against it, though, of course, the 
could be derived. I, naturally, tend to think of them as "one-way" 
derived, i.e., from the bottom/grounding up. But, I suppose you could 
think of them as constrains imposed from the process model down. Why 
you would want to express those constraints, as such, escapes me. I 
guess if you *know* you want to outsource some functionality, it would 
make sense.

> I think I know the answer that may be forthcoming: because we want an 
> abstract process specification (for some purposes at least) which 
> doesn't actually commit to whether B2 is being invoked, or accepted, 
> or just run as a subprogram, or whatever.  So we just mention B2, as 
> something that needs to get done at this point in the composite 
> process, and the "details" become clear via dataflow and grounding.

Yes. I'll take that position.

>  My rejoinder is twofold:
>
> First, someone who advocates this answer needs to show how my 
> complaints about either Approach 1 or Approach 2 can be addressed (or 
> argue that they aren't important complaints),

Well, I agree that approach 1 is probably broken, and as I feel like 
it's pretty unnatural, I don't want to spend effort fixing it.

But your objection to 2 isn't yet substantiated. I don't think I find 
it at all hard to read or think about. The syntax is unfortunate, in 
some ways, but I'm much less persuaded by these arguments.

If you want the control flow manifiested with control constructs, a 
reasonable modeling desire, then I offer my split-join example. Which, 
I think, reveals that your simplifying assumption reveals something 
that *I'd* want to have manifest: that "CompositeProcesses" have some 
sort of extra status.

And yet, CompositeProcesses aren't supposed to be "directly" invocable. 
I have trouble reconciling all this.

> or put forward some other approach that addresses them, and I don't 
> think that's yet been done.

I want to know more about complaint 2. What's specifically hard?

> Second, once you've specified your dataflow, at that point you've 
> already committed to how B2 is being used (invoked, accepted, run as a 
> subprogram, or whatever).  At that point, at least, why not make the 
> control flow more comprehensible?

Well, I can see thinking that approach 2 is perverse, in some sense, 
though if I model things that way I'm likely to find dataflow style 
controling fairly natural, so perfectly comprehensible.

> With constructs for "invoke" and "accept", it seems to me, things may 
> become much nicer.  But I need to substantiate this claim, and will 
> try to do so in another message.

One thing that isn't clear to me is whether they have to be done for 
every, or almost every, AtomicProcess step (and what you do about 
SimpleProcesses).

<sequence>
	<invoke resource=A1/>
	<accept resource=B2/>
	<invoke resource=A2/>
</sequence>

I'll follow this up in reply to your later message.

Cheers,
Bijan Parsia.
Received on Tuesday, 7 October 2003 05:55:16 UTC