- From: Drew McDermott <drew.mcdermott@yale.edu>
- Date: Sun, 7 Sep 2003 12:50:41 -0400 (EDT)
- To: www-ws@w3.org
- Message-Id: <200309071650.h87Gof523119@pantheon-po04.its.yale.edu>
I volunteered to produce a surface syntax and formal semantics for the revamped DAML-S in which processes are individuals (not classes). Here is the result, as a raw LaTeX file and a postscript version. Comments are more than welcome. This version no doubt contains several bugs and omissions, some which I disarmingly admit to in the text. Let me know about any others. -- -- Drew McDermott Yale Computer Science Department \documentclass[11pt]{article} \usepackage{helvet} \usepackage{alltt} \def\la{\langle} \def\ra{\rangle} \def\inp{\ensuremath\downarrow\!} \def\outp{\ensuremath\uparrow\!} \def\inoutp{\ensuremath\downarrow\!\uparrow\!} \def\thisproc{\texttt{@}} \def\env{\mbox{\it env}} \def\omap{\mbox{\it outmap}} \def\muni{\,\backslash\! +} \newcommand\itm[1]{\mbox{\textit{#1}}} \newcommand\rmm[1]{\mbox{\textrm{#1}}} \newcommand\ttm[1]{\mbox{\texttt{#1}}} \def\noteme#1{}%{[[ #1]]} \def\notecoauth#1{[[#1]]} \def\notereader#1{[[#1]]} \begin{document} \begin{center} Surface Syntax and Formal Semantics for DAML-S-PAI \\ ** DRAFT 0.1 ** \\ Drew McDermott \notereader{and the DAML-S coalition, if they don't expel me} \\ September 7, 2003 \\ Copyright 2003 by the DAML-S Coalition and its vast team of lawyers \\ \end{center} \section{Goals} \label{sec:intro} This is a proposal for a surface syntax and formal semantics of a ``processes as instances'' (PAI) notation for DAML-S. The idea is to describe a process as an object, typically consisting of pieces which are themselves processes. The goals of this exercise are to \begin{enumerate} \item Provide readable surface syntax \ldots \item \ldots That is easily translatable to RDF; \item Provide precise answers to questions such as ``Does a process step proceed if some of its inputs are unavailable?'' \end{enumerate} \section{Syntax and Informal Semantics} The syntax we propose is somewhat Lisp-based, but not entirely. The main reason to go this route is that Lisp's concrete syntax is essentially isomorphic to its abstract syntax, so you can view this as an abstract description of something with more infix operators if you like. \subsection{Informal Syntax} \label{sec:surface} The key concept in DAML-S is the \emph{process}, which is an activity carried out by an agent, typically a web service or a client. A \emph{process definition} is a description of a process. Processes come in several flavors, atomic, simple, and various sorts of composite, distinguished by their control constructs (conditional, choice, parallel, loop, etc.). We will assume that a process starts with its control construct, or the reserved words {\tt atomic} or {\tt simple}. So an if-then-else might look like \begin{center} \texttt{(if\_then\_else \ldots)} \end{center} Every process can have input and output parameters, described using fields \texttt{:args} and \texttt{:results}. Input and output parameters may have optional types, the OWL classes they belong to. So a simple sequence might be described thus: \begin{alltt} (sequence :args (a - Integer) \textit{---steps---} :results (b - String)) \end{alltt} However, once we get into the formal part of this paper, we will ignore type declarations in the interest of clarity. Besides \texttt{:args} and \texttt{:results}, there can be a \texttt{:locals} declaration. Processes can also have preconditions, which must be true before the process can be started: \begin{alltt} (sequence :precondition (exists (x - Credit-card) (and (credit-card-of x customer) (not (maxed-out x)))) ...) \end{alltt} \notereader{In this paper we neglect conditional outputs and effects, although we believe they present no special problems.} There is a crucial distinction between a process and a \emph{process occurrence}. The distinction is obvious in a case like this: \begin{alltt} (sequence (toggle\_the\_switch) (toggle\_the\_switch)) \end{alltt} \noindent which contains two occurrences of the process \texttt{toggle\_the\_switch}. In keeping with RDF style, the description of a process may accompany one of its occurrences, or may be placed elsewhere. In the example above, \texttt{toggle\_the\_switch} must obviously be defined somewhere else. Instead, we could have written this: \begin{alltt} (sequence (atomic\_process :name toggle\_the\_switch) (toggle\_the\_switch)) \end{alltt} Processes don't have to have names. If we want to give a name to a process occurrence, we use the \texttt{tag} construct: \begin{alltt} (tag-scope (tag1 tag2) (sequence (tag tog1 (simple\_process :name toggle\_the\_switch)) (tag tog2 (toggle\_the\_switch)))) \end{alltt} The \texttt{tag-scope} construct is necessarily to indicate the scope of the names. One can imagine other ways of doing this (such as having the scope be the innermost iteration). We can use \texttt{tag} names to describe dataflows between steps. Suppose we have a process for authorizing uses of credit card. Because it must communicate with some computer in a central location, it sometimes times out if traffic to that computer is heavy. So the process has three possible outputs: \texttt{authorized}, \texttt{not-authorized}, and \texttt{timeout}. The process used by a retailer might be to try the subprocess one or two times and then give the customer the benefit of the doubt: \begin{alltt} (def\_type CC\_check\_res (Constant "authorized" "not-authorized" "timeout")) (def\_type CC\_acc\_status (Constant "accepted" "not-accepted")) (simple\_process :name check\_auth :args (cc - Credit\_card\_data) :results (res - CC\_check\_res)) \end{alltt} \begin{alltt} (sequence :results (final\_res - CC\_acc\_status) (check\_auth cc <= cc res => (ch1res(\(\inp\) check1))) (tag check1 (if\_then\_else :args (ch1res - CC\_check\_res) :cond (ch1res = "timeout") :then (sequence :results (ch2res - CC\_acc\_status) (check\_auth cc <= cc res => (ch2res(\(\inp\) check2))) (tag check2 (if\_then\_else :args (ch2res - CC\_check\_res) :results (res - CC\_acc\_status) :cond (ch2res = "not-authorized") :then (value "not-accepted" => final\_res) :else (value "accepted" => final\_res)))) :else (if\_then\_else :cond (ch1res = "authorized") :then (value "accepted" => final\_res) :else (value "not-accepted" => final\_res))))) \end{alltt} \subsection{Formal Syntax} \label{sec:formsyn} In this section we introduce a somewhat simpler syntax that is easier to explain the semantics of. In the next section we explain how to derive the simple syntax from the user-friendly syntax. \begin{alltt} <defn> ::= (atomic\_process <identifier>) | (process <interface-decl> <statement>) <statement> ::= (call <identifier>) | (sequence <statement>*) | (parallel <statement>*) | (choice <statement>*) | (if\_then\_else :cond <proposition> :then <statement> [:else <statement>]) | (with\_precondition <proposition> <statement>) | (with\_params [:args (<identifier>*)] [:locals (<identifier>*)] [:results (<identifer>*)] <statement>) | (tag\_bind (<tag-binding>) <statement>) | (with\_dataflow <statement> <term> => <channel term>) \end{alltt} In a \texttt{with\_params} statement we will allow the parameter declarations to be moved around, often putting the \texttt{:results} after the \texttt{<statement>}. \begin{alltt} <tag-binding> ::= (<identifier> <tag-id>+) <in-or-out> ::= \(\inp\) | \(\outp\) <term> ::= <channel term> | \emph{insert term-language grammar here} <channel term> ::= <channel-spec> | <identifier> <channel-spec> ::= (<identifier> (<in-or-out> <tag-id>+)) <tag-id> ::= <identifier> | <natural number> \end{alltt} %%%% | (split+join <statement>*) %%%% | (unordered <statement>*) %%%% <input-flow> ::= (<identifier> <= <flow designator>) %%%% <output-flow> ::= (<identifier> => <flow designator>) Most of the changes modularize mechanisms such as input/output declarations, input-output flows, and tag bindings. The first main change is to do away with the \texttt{tag} and \texttt{tag\_scope} constructs and replace them with \texttt{tag\_bind}. What was written informally as \begin{alltt} (tag\_scope (foo) (sequence \(s_1\) (tag foo \(s_2\)) \(s_3\))) \end{alltt} \noindent is now considered syntactic sugar for \begin{alltt} (tag\_bind ((foo 2)) (sequence \(s_1\) \(s_2\) \(s_3\)))) \end{alltt} \noindent The idea is that sequence steps are normally tagged 1, 2, \ldots. Symbolic tags can be introduced by explaining the mapping of the symbols to the built-in tags. For conditionals the built-in tags are \texttt{:then} and \texttt{:else}. For loops (which we are neglecting complely), the built-in tags are natural numbers, corresponding to iterations. The \texttt{tag\_bind} construct can be used to refer to steps that are more deeply nested than in the previous example. For instance, in \begin{alltt} (tag\_bind ((foo :else 2)) (if\_then\_else :cond C :then A :else (sequence D E F))) \end{alltt} \texttt{foo} tags step \texttt{E}, the second step of the \texttt{:else} step of the construct. The sugared syntax allowed \texttt{:args}, \texttt{:locals}, and \texttt{:results} to be declared for all constructs. We now provide an explicit \texttt{with\_params} statement. Similarly for preconditions. The last major change is that dataflows are specified in only one place, using a \texttt{with\_dataflow} statement. The destination for the flow can be specified in one of two ways: as an identifier (denoting an input or output channel) bound in the scope of the \texttt{with\_dataflow} statement, or as a designator for a flow into or out of one of steps of the \texttt{<statement>} the \texttt{with\_dataflow} modifies. The identifier \thisproc{} is reserved for this process itself. For example, the flow from the second occurrence of \texttt{check\_auth} to statement \texttt{check2} would be moved up to the sequence level, and written: \begin{alltt} (with\_dataflow (tag\_bind ((check2 2)) (with\_params :results (ch2res - CC\_acc\_status) (sequence (call :name check\_auth) (with\_params :args (ch2res - CC\_check\_res) :results (res - CC\_acc\_status) (if\_then\_else ...))))) (cc (\(\outp\)1)) => (ch2res (\(\inp\)check2))) \end{alltt} (The flow from into the \texttt{cc} input of the \texttt{call} is left as an exercise for the reader.) Other, lesser, changes are: \begin{enumerate} \item To require calls of defined processes to be labeled explicitly using \texttt{(call \textit{proc} \ldots)}. \item To require parentheses around \texttt{<input flow>}s and \texttt{<output flow>}s. \end{enumerate} One thing that is absent from the formal treatment is any notion of typing of variables. That absence can't be made up by any kind of sugaring. Eliminating types makes the formal semantics simpler, so from now on forget that variables might have types. \subsection{Informal Syntax as Syntactic Sugar for the Formal Kind} Procedure for ``desugaring'' the concise notation in order to arrive at its equivalent in the formal syntax: \begin{enumerate} \item Replace every \texttt{tag\_scope} with the equivalent \texttt{tag\_bind}, and replace $\ttm{(tag\ } g\ e\ttm{)}$ with $e$; \item If a statement contains an \texttt{:args}, \texttt{:locals}, or \texttt{:results} declaration, tranform it into a \texttt{with\_params} statement, with the parameter declarations pulled out in the obvious way. \item Every dataflow expression of the form $c \ttm{<=} v$ or $v\ttm{=>}c$ is deleted. A \texttt{with\_dataflow} expression is created and wrapped around all the \texttt{tag\_bind} expressions that bind a tag referred to in $c$ or $v$. \end{enumerate} \notereader{These rules are not quite right. There are some sugared process expressions that apparently can't be desugared. The problem is that the \texttt{tag\_bind} mechanism is a bit broken. It doesn't really bind anything, in the sense that you can't refer to the new tags \emph{inside} the scope of the \texttt{tag\_bind}.} \section{Formal Semantics} \subsection{Technicalities} To explain the formal semantics, we need to talk about what the universe looks like. It certainly includes a set of objects $U$, and in a full treatment $U$ ought to be broken down into classes corresponding to the types of the variables, but, as explained in section~\ref{sec:formsyn}, we are simplifying variable types away. In addition to objects, we also need a set of \emph{situations}. Classically these are taken to be snapshots of the world, that is, assignments of truth values to all (timeless) propositions. We adopt that position, with the twist that situations are not defined by their truth assignments; two distinct situations can assign exactly the same values to all propositions. We assume there is a partial order on situations. $\sigma_1 \preceq \sigma_2$ means that $\sigma_1$ is ``in $\sigma_2$'s past.'' Whether there is a unique future for any situation, or time \emph{branches} into the future, is an issue that needn't concern us here.\footnote{The Reiter School treats situations as sequences of actions (i.e., choices made by an agent). That approach could probably be made to work for us, but for now we'll just be neutral about exactly what situations are metaphysically.} %%%% We use the symbol $\Sigma$ to refer to the set of all situations. To handle dataflow, we need a few extra object types. An input or output will be bound to an object of type \emph{channel}. A channel is essentially a storage location. %%%% A channel can hold a value from $U$, or a special %%%% value $\bullet$, not an element of $U$, which means ``unassigned.'' A \emph{channel map} is a set of ordered pairs $\{\la \itm{ch}, \itm{val} \ra\}$, where $\itm{val}\in U$. %%%% \cup \{\bullet\} A channel map specifies the current contents of a set of channels. A channel with no value does not appear in the channel map, so $\itm{channel map}(c) = \perp$, the ``undefined value.'' We assume the existence of an infinite number of channels, so we can always find one we haven't used yet. An \emph{environment} is a mapping from identifiers to values. Values are from the set $U \cup \itm{channels}$. A special case is an environment that maps arg/result parameters to channels, which we will call a \emph{parameter map}. The denotation of a DAML-S process expression is (modulo technicalities) a set of pairs $\la x_0,x_1\ra$, where $x_0$ is a ``starting point'' and $x_1$ is a ``finishing point,'' the intuition being that one way for the process to proceed starting in $x_0$ is to wind up in $x_1$. $x_0$ and $x_1$ are \emph{valuated situations}, triples of the form $\la p,c,s \ra$, where $s$ is a situation, $c$ is a channel map, and $p$ is a parameter map. $s$ represents the state of the outside world, $c$ represents the contents of channels, and $p$ associates arg/results parameters of the process with channels. Note that $c$ is a ``global'' channel map, and can mention channels other than those mentioned in $p$. To refer to the pieces of a valuated situation, we use $\itm{pm},\itm{cm}, \itm{sit}$, so that valuated situated $s = \la\itm{pm}(s), \itm{cm}(s), \itm{sit}(s)\ra$. A pair of valuated situations $\itm{vs}_1,\itm{vs}_2$, where $\itm{sit}(\itm{vs}_1)\preceq \itm{sit}(\itm{vs}_2)$, is called a \emph{valuated situation interval}, or VSI. So the denotation of a process is (almost) a set of valuated situation intervals, or, equivalently, a relation on valuated situations. $\la \itm{vs}_1,\itm{vs}_2\ra$ is in the set if $\itm{vs}_2$ is one place the process can finish given that it starts in $\itm{vs}_1$. We can refer to the pieces $\itm{vs}_1,\itm{vs}_2$ of a VSI $\itm{vsi}$ as $\itm{begin}(\itm{vsi}),\itm{end}(\itm{vsi})$.\footnote{ One way to think about nondeterminism would be to allow a process to end in more than one valuated situation. But there other ways to think about branching time, and we don't want to commit to a particular scheme.} The presence of tags compels us to impose some structure on the VSI a process term denotes. A tag refers to a piece of a process execution trace. So we introduce the notion of a \textit{tag tree}, which records the bindings of tags to valuated situation intervals. Each node of the tree is labeled with a VSI, and has a finite number of subtrees, the links to which are labeled with tags. If $m$ is such a tree, then $\itm{piece}(m, t_1\, t_2\, \ldots \,t_k)$, where each $t_i$ is a \texttt{<tag-id>} is the VSI obtained by getting piece $t_1$ of $m$, then piece $t_2$ of that subtree, and so forth. $\itm{piece}(m)$ is the VSI associated with the whole tag tree. For clarity, one can also write this as $\thisproc m$. The labels below the top node of a tag tree $m$ are denoted by $\itm{labels}(m)$. If the sequence of tag-ids contains a tag corresponding to no subtree, the value of the \textit{piece} expression is $\perp$. The symbol $\emptyset$ is used for a null tree, associated with no VSIs. The notation $m+ t \mapsto m'$ means the tree $m$ augmented with a new branch labeled $t$ that takes you to tag tree $m'$, which thereby becomes a subtree of the new tree. %%%% A single VSI $v$ stands for %%%% the tag tree whose sole node is labeled $v$. A leaf tag tree whose sole node is labeled $v$ is written $\emptyset + \thisproc \mapsto v$. If there is already a subtree or VSI associated with a label $l$ in $m$, then $m+ l\mapsto x$ is $m$, but with $\itm{piece}(m,l)$ reset to $x$. A key requirement for a useful tag tree is that the VSIs labeling its nodes must be \emph{comparable.} That is, if $\la s_1,s_2\ra$ is a VSI from somewhere in a tag tree and $\la s_3, s_4\ra$ is a VSI from somewhere else, then the four points are ordered by $\preceq$ somehow; that is, there is a permutation $i_1,i_2,i_3,i_4$ of $1,2,3,4$ so that $\itm{sit}(s_{i_1}) \preceq \itm{sit}(s_{i_2}) \preceq \itm{sit}(s_{i_3}) \preceq \itm{sit}(s_{i_4})$. A set of tag trees may describe alternative timelines (or time branches, or possible worlds, \ldots), but each tag tree is drawn from a single timeline. Another important requirement is that a tag tree be \emph{inertial}, which means ``internally'' and ``externally'' inertial. A set of valuated situation intervals is \emph{internally inertial} if it has the following two properties: \begin{enumerate} \item \emph{Parsimony:} No channel appears in a channel map unless some parameter is or was bound to it at some point. \item \emph{Conservation:} Nothing changes the contents of channels except dataflows to the input parameters denoting them. \end{enumerate} Both properties are expressed as constraints on consecutive VSIs, defined thus: $\itm{vsi}_1$ and $\itm{vsi}_2$ are \emph{consecutive} in a set of VSIs $S$ if and only if \begin{tabbing} \hspace{2ex}\=$\itm{vsi}_1\in S \rmm{\ and\ } \itm{vsi}_2 \in S$ \+\\ and $\itm{sit}(\itm{end}(\itm{vsi}_1)) \prec \itm{sit}(\itm{begin}(\itm{vsi}_2)) $\\ $\rmm{and there }$\=$\rmm{is no\ } \itm{vsi'} \in S \rmm{\ such that\ } $\+\\ $\itm{sit}(\itm{end}(\itm{vsi}_1)) \prec \itm{sit}(\itm{end}(\itm{vsi}')) \prec \itm{sit}(\itm{begin}(\itm{vsi}_2))$ \-\-\\ \end{tabbing} One might wonder about the possibility of there being an infinite sequence of situations between $\itm{vsi}_1$ and $\itm{vsi}_2$. In this document, we consider only finite tag trees, so the issue won't come up, but when the formalism is extended to iterations, we will certainly need an ``anti-Zeno'' rule to prevent such pathological cases from arising. Parsimony is then formalized as follows: If $c$ is a channel in $\itm{cm}(\itm{begin}(v))$ for some VSI $v$ in our set, then either $c$ appears in $\itm{cm}(\itm{end}(v'))$ for some $v'$ such that $v'$ and $v$ are consecutive, or there is a parameter $p$ such that $(\itm{pm}(\itm{begin}(v)))(p) = c$. Conservation is expressed thus: If a set of VSIs contains two consecutive elements $\itm{vsi}_1$ and $\itm{vsi}_2$, then $\itm{cm}(\itm{begin}(\itm{vsi}_2))$ differs from $\itm{cm}(\itm{end}(\itm{vsi}_1))$ only on channels denoted by input parameters to $\itm{vsi}_2$. More technically, \begin{tabbing} \hspace{2ex}\=If \=$\itm{vsi}_1 \rmm{\ and\ } \itm{vsi}_2 \rmm{\ are consecutive}$ \+\+\\ $\rmm{then if }$\=$\rmm{there is a }$\=$\rmm{channel\ } c \rmm{\ such that\ } $ \+\+\\ $(\itm{cm}(\itm{end}(\itm{vsi}_1))(c) \neq (\itm{cm}(\itm{begin}(\itm{vsi}_2))(c) $\-\\ $\rmm{then there is a parameter\ } p \rmm{\ such that\ } (\itm{pm}(\itm{begin}(\itm{vsi}_2)))(p) = c $\end{tabbing} A set of VSIs is \emph{externally inertial} if, whenever two of its elements, $\itm{vsi}_1$ and $\itm{vsi}_2$, are consecutive, then the only changes that occur between $\itm{sit}(\itm{end}(\itm{vsi}_1))$ and $\itm{sit}(\itm{begin}(\itm{vsi}_2))$ are those allowed by the ``physics'' of the world. In the present paper we will say nothing about physics, except to note that if the world obeyed the laws of ``classical planning'' [<bibref>] then \emph{no} changes would occur between the end of $\itm{vsi}_1$ and the beginning of $\itm{vsi}_2$, and the two situations would assign the same truth values to all propositions. Of course, the world of web services is definitely nonclassical, and the physics could get very intricate, what with autonomous processes, randomness, and the presence of other agents, some more cooperative than others. Having defined all these concepts in terms of sets of VSIs, we extend them to a tag tree by applying them to the set of all VSIs labeling nodes of the tree. The formal semantics makes use in a few places of sets of ordered pairs, a.k.a functions, so we introduce some common terminology to talk about them. Typically our functions are finite, hence partial. So $F(x)$ is either the $y$ corresponding to $x$ in the list of ordered pairs, or $\perp$ if there's no such pair. We often want to avoid $\perp$, because it indicates that something meaningless or impossible has occurred; $F(\perp) = \perp$ \noteme{almost always}, so $\perp$ will propagate out to make a given syntactically correct DAML-S expression meaningless. If $F$ is a function, then $F\muni F'$ means $F \backslash \mbox{\it overlap}(F,F') \cup F'$, where \textit{overlap}$(F,F')$ is the set of elements of $F$ whose arguments are in the domain of $F'$. In other words, $F\muni F'$ is $F \cup F'$ with bindings from $F'$ replacing those in $F$ that give different results for a given argument: $(F \muni F')(x) = \itm{\ if\ } x\in \itm{dom}(F') \itm{\ then\ } F'(x) \itm{\ else\ } F(x)$. %%%% Another way to combine functions is the operator $F_1\uplus F_2$. %%%% This is equivalent to $F_1\cup F_2$, if for all $x \in \itm{dom}(F_1) \cap %%%% \itm{dom}(F_2)$, $F_1(x) = F_2(x)$; i.e., $F_1$ and $F_2$ on all %%%% points they share in common. %%%% Otherwise, $F_1\uplus F_2 = \perp$. There one more gimmick we need here for technical reasons. %%%% , in addition to the aforementioned $\perp$ and %%%% $\bullet$. We use the symbol ``$\diamond$'' to refer to the ``nowhere channel,'' a mathematical \texttt{/dev/null}. It's not an error to write to this channel, but it is an error to try to read from it. \subsection{The Semantics} We are finally ready to proceed. We will define the function ${\cal M}(e,m,\env)$, which gives the meaning of expression $e$ with respect to tag tree $m$ and environment \env. Because some expressions denote channels, every expression can be thought of as having a c-value and a d-value, which correspond to the ``l-value'' and ``r-value'' from programming-language semantics. So ${\cal M}(e,m,\env) = \la {\cal C}(e,m,\env), {\cal D}(e,m,\env)\ra$. $\cal D$ gives the \emph{d-value} (denotation) of an expression, and $\cal C$ gives its \emph{c-value}, the channel, if any, that the d-value came from. For many expressions, such as \texttt{5} and \texttt{x+y}, the c-value is undefined, so it will be $\perp$ or $\diamond$, as appropriate. In cases where it is $\perp$, we will give the value of $\cal D$ for an expression and not bother with spelling out $\cal C$ explicitly. Our main focus in this paper is the denotations of process expressions \textit{exp}, so what we are ultimately interested in is the set of valuated situation intervals $$\itm{image}(R, {\cal D}(\itm{exp},\emptyset,E_0))$$ where $E_0$ is the built-in or ``basis'' environment; $\itm{image}(f,S)$ means $\{f(x) | x \in S\}$; and $R(m)= \thisproc m$, the VSI labeling the root node of $m$. A full treatment of the semantics would specify the meanings of all expressions, including the long-awaited formalism for propositions (conditions and effects). But we're going to focus here on the process sublanguage, so the long-awaited formalism will have to be awaited a little longer. \notereader{Actually, I expect no substantive problems to arise in extending $\cal M$ to a logical language. In particular, it should be possible to refer to the entities, such as channels, bound in a process, simply by using their names.} We start by giving the semantics of channel terms. \begin{tabbing} \hspace{2ex} \= ${\cal M}(\itm{id}(\inp t_1 \ldots t_k), m, \env) = H(\itm{begin}(\itm{piece}( m, t_1 \ldots t_k)), \itm{id})$ \+\\ \= ${\cal M}(\itm{id}(\outp t_1 \ldots t_k), m, \env) = H(\itm{end}(\itm{piece}( m, t_1 \ldots t_k)), \itm{id})$ \-\\ where $H(\itm{vs}, \itm{id}) = \rmm{\ If\ }$\= $\itm{vs}=\perp \rmm{\ then\ } \la \diamond,\perp \ra $ \+\\ $\rmm{else\ Let\ }$\=$c = (\itm{pm}(\itm{vs}))(\itm{id})$ \+\\ $\rmm{in if\ } $\= $c \rmm{\ is a channel then \ } \la c, (\itm{cm}(\itm{vs}))(c)\ra$ \+\\ $\rmm{else \ } \la \diamond, \perp \ra$ \-\-\-\\ If $\env(\itm{id})$ is a channel $c$, ${\cal M}(\itm{id}, m, \env) = \la c, (\itm{cm}(\itm{end}(\thisproc m)))(c)\ra$ \\ For all other $e$, \+\\ ${\cal C}(e, m, \env) = \perp$ \end{tabbing} so that we can define ${\cal M}(e,m,\env)$ just by specifying ${\cal D}(e,m,\env)$. We construct an interpretation of a set of composite processes in the usual way. We start with an assignment of a set of VSIs to each atomic-process name. We'll just assume that $E_0$, the initial environment, includes these assignments. The definition of $\cal D$ is a pile of special cases, one for each sort of composite process. \begin{tabbing} %%% named \= If $a$ is a named process (especially an atomic one), then \\ ${\cal D}(\mbox{\tt (call\ } a\mbox{\tt)}, m_0, \env) = \env(a) = \{ \emptyset + \thisproc \mapsto v | v \in \env(a)\}$ \+\\ \\ %%% sequence ${\cal D}($\texttt{(seq}\=\texttt{uence $p_1$ $p_2$ $\ldots$ $p_n$)}, $m_0$, $\env)$ \+\\ $=\{\mbox{\rm tag }$\= $\mbox{\rm tree\ } m |\{1, \ldots,n\}\subseteq\mbox{\it labels}(m)$ \textrm{and}\+\\ $\mbox{\rm for some\ }$\=$v_1, \ldots, v_n, \rmm{\ where\ } v_i\in {\cal D}(p_i,m_0,\env)$ \+\\ \=$\mbox{\rm for all\ } j, 1 \leq j \leq n, \itm{piece}(m,j) = v_j$ \\ $\mbox{\rm and for all\ } j, 1 \leq j < n, \itm{sit}(\mbox{\it end}(v_j)) \preceq \itm{sit}(\mbox{\it begin}(v_{j+1}))$ \\ $\rmm{and\ } v_1,\ldots,v_n \rmm{\ are an inertial sequence}$ \\ $\mbox{\rm and\ }$\= $\thisproc m = \la $\=$ \itm{begin}(v_1), \itm{end}(v_n))\}$\-\-\-\\ \end{tabbing} Note that we don't have to mention comparability of the $v_i$ because it's implied by the requirement that they are linearly ordered. \begin{tabbing} %%% parallel ${\cal D}(\ttm{(par}$\=$\ttm{allel\ } s_1 \ldots s_k\ttm{)}, m_0, \env)$ \+\\ $= \rmm{\ Let\ }$\= $M = \{$\=$\emptyset + (1 \mapsto v_1) + \ldots + (k \mapsto v_k) $\+\+\\ $| \rmm{\ so}$\=$\rmm{me\ } v_1 \in {\cal D}(s_1,m_0,\env), \ldots, v_k \in {\cal D}(s_k, m_0, \env) \}$ \+\\ $\rmm{such that the\ } v_i \rmm{\ are comparable and inertial}\}$ \-\-\\ in $\itm{image}(\itm{combine}, M) $ \\ where $\itm{com}$\=$\itm{bine}(m)$ \+\\ $ = \{$\=$m + \thisproc \mapsto \la v_b, v_e \ra $\+\\ $| \ $\=$\rmm{Let\ } S_b = \{v | $\= $v= \itm{begin}(\itm{piece}(m,i)) $ \+\+\\ $ \rmm{\ for some\ } i \in [1,k]\}$ \-\\ $\rmm{and\ } S_e = \{v | $\= $v= \itm{end}(\itm{piece}(m,i)) $ \+\\ $\rmm{\ for some\ } i \in [1,k]\}$ \-\\ $\rmm{in\ } v_b = \itm{\ earliest}(S_b) \rmm{\ and\ } v_e = \itm{\ latest}(S_e)\}$ \-\-\-\-\-\\ \\ %%% choice ${\cal D}(\ttm{(cho}$\=$\ttm{ice\ } c_1, \ldots, c_k\ttm{)}, m, \env) $ \+\\ $= \cup_{j=1}^k {\cal D}(c_j, m, \env)$ \-\\ \\ %%% if_then_else ${\cal D}($\texttt{(if\_}\=\texttt{then\_else } \texttt{:cond} $t$ \texttt{\ :then } $Y$ \texttt{:else } $N$\texttt{)}$, m_0, \env)$ \+\\ $= $\=$C(\ttm{true}, Y, \ttm{iftrue}) \cup C(\ttm{false}, N, \ttm{iffalse}) $\+\\ where $C(\tau, e, l) = \{$\=$\emptyset + (\thisproc \mapsto m) + (l \mapsto m)$ \+\\ $|$ for \=some $m\in {\cal D}(e,m_0,\env)$ \+\\ $\rmm{such that\ } {\cal D}(t, m, \env) = \tau\}$ \-\-\-\-\\ \\ ${\cal D}(\ttm{(wit}$\=$\ttm{h\_precondition\ } p\ s\ttm{)},m_0,\env) $ \+\\ $=\{m \in {\cal D}(p,m_0,\env)\, |\, p \rmm{\ is true in\ } \itm{sit}(\itm{begin}(\thisproc m)) \}$ \-\\ \end{tabbing} That concludes our definitions of the basic control constructs. Now we handle the various notations for binding parameters and passing values to them. \begin{tabbing} %%% with_params ${\cal D}$\=$(\ttm{(with\_params\ } q\ \ttm{:args\ } A \ttm{\ :locals\ } V \ttm{\ :results\ } R\ttm{)}, m_0, \env)$ \+\\ $=$\ Let \=$E_A = \{\la a,c\ra | a \in A \mbox{\it\ and\ } c \mbox{\it\ a new channel}\}$ \+\\ $E_V = \{\la v,c\ra | v \in V \mbox{\it\ and\ } c \mbox{\it\ a new channel}\}$ \\ $E_R = \{\la r,c\ra | r \in R \mbox{\it\ and\ } c \mbox{\it\ a new channel}\}$ \-\\ then let\ \= $M = {\cal D}(p,m_0, \env\muni E_P \muni E_V \muni E_R$) \+\\ in \=$\itm{image}(A,M)$ \+\\ whe\=re for all $m\in M$ \+\\ $A(m) = $\=$\ m + \thisproc \mapsto \la \itm{vs}_b, \itm{vs}_e \ra$ \+\\ $\rmm{where\ }$\=$\rmm{let\ } \la\itm{vs}'_b,\itm{vs}'_e\ra = \thisproc m $\+\\ \=$\rmm{in\ }\itm{vs}_b= \la \itm{pm}(\itm{vs}'_b) \muni E_P, \itm{cm}(\itm{vs}'_b), \itm{sit}(\itm{vs}'_b)\ra $ \\ $\rmm{and\ }\itm{vs}_e = \la \itm{pm}(\itm{vs}'_e) \muni E_R, \itm{cm}(\itm{vs}'_e), \itm{sit}(\itm{vs}'_e)\ra $ \-\-\-\-\-\-\\ \\ %%% with_dataflow ${\cal D}(\ttm{(with\_}$\=$\ttm{dataflow\ } p\ r \ttm{=>} c\ttm{)}, m_0, \env)$ \+\\ $= \{$\=$m\in{\cal D}(p,m_0,\env)$ \+\\ $|\ $\=${\cal M}(c,m,\env) = \la x, v_c\ra, x\neq \perp, \rmm{\ and\ } {\cal D}(r,m,\env) = v_r \neq \perp$ \+\\ $\rmm{\ and\ } v_r = v_c\}$ \-\-\-\\ %%%% \= Let $v = \itm{r-value}(r,m,\env) $ \+\\ %%%% and $\la c, q\ra = \itm{cv-value}(c,m,\env)$ \\ %%%% in $c \neq \perp$ and $v=q\}$ \-\-\-\\ \\ %%% tag_bind ${\cal D}(\texttt{(tag\_}$\=$\texttt{bind ((}\mbox{\it id}\ \mbox{\it tag}\texttt{))\ } P\texttt{)}, m_0, \env)$ \+\\ $= \itm{image}(f,{\cal D}(P, m_0, \env))$ \\ where \=$f(m) = m + \itm{id} \mapsto \itm{piece}(m, \itm{tag})$ \end{tabbing} \subsection{Some clarifying remarks} \label{sec:clarify} In case the meaning of all this isn't intuitively obvious, let us make a few remarks. A key feature of the the \texttt{with\_params} construct is that it binds channel identifiers in two places: the environment, and the parameter maps for the process. That allows us to refer to them in two different ways. Within the process, we can refer to them by name, because they're bound in the environment. After leaving the process, but within the scope of a \texttt{with\_dataflow} wrapping it, we can refer to the same channels by their associations with the parameter maps for the process's steps. E.g., in \begin{alltt} (with\_params :args (w) (with\_dataflow (with\_dataflow (sequence (call foo) (with\_params :args (x) (call baz) :results (y)) (with\_params :args (z) (call shazam))) w => (x(\(\inp\)2))) (y(\(\outp\)2)) => (z(\(\inp\)3)))) \end{alltt} we send the d-value of \texttt{w} to the channel \texttt{x} of (i.e., bound around the call to) \texttt{baz}, and we send the contents of the channel \texttt{y} of \texttt{baz} to the input channel \texttt{z} of \texttt{shazam}. The d-value of \texttt{y(\(\outp\)2)} is obtained in two stages: letting $\itm{evs}$ be $\itm{end}$ valuated situation of step 2, we first retrieve the channel for \texttt{y} from the parameter map of \textit{evs}, then we retrieve the contents of that channel from the channel map of \textit{evs}. Similarly, letting \textit{bvs} be the \textit{begin} valuated situation of step 3, we get the \texttt{z} channel from the parameter map of \textit{bvs} and we store the new association in the channel map of \textit{bvs} Actually, it may not be clear how anything gets ``stored.'' The definition of \textit{with\_dataflow} simply requires that the d-value of one expression be found in the c-value of another, as if by magic. What actually happens is that the semantics assigns a large number of possible denotations to a process term. In some of them, the c-value (a channel) happens to contain the correct value, and the others are weeded out. Nothing in the example tells us how the channel for \texttt{y} gets set or what to; that presumably occurs inside step 2. If it never does get set, then the channel map will give us the value $\perp$ for the channel. According to the definition of $\ttm{with\_dataflow}$, a tag map with this feature will not pass muster. Another tricky bit is that we have to make sure we always use the right channel map to retrieve parameter values from. Typically it's the latest one available. Internal inertiality guarantees that channel contents accumulate as time passes, except for dataflows. \section{Relationship to ``Deep'' Syntax (RDF)} The surface syntax we have been working with was designed with an eye toward transformation into RDF. A process would be described as an object with various properties. E.g. $\ttm{(if\_then\_else :cond\ } c \ttm{\ :then\ } a \ttm{\ :else} b\ttm{)}$ would get turned into \begin{alltt} <if_then_else> <cond> \textit{Representation of c} </cond> <then> \textit{Representation of a} </then> <else> \textit{Representation of b} </else> </if_then_else> \end{alltt} \texttt{:name} declarations would get turned into \texttt{ID} attributes. Parameter and precondition declarations would be handled as subelements of the process elements they belong to. The representation of dataflows is closer to the formal syntax than the informal version. That is, the dataflow is itself an object with properties (\texttt{from-expression}, \texttt{to-param}, \texttt{to-step}), which would be a subelement of the appropriate process element. That is, $\ttm{(with\_dataflow\ } \ttm{(sequence} \ldots \ttm{)} \ e \ttm{=> (}p \ttm{(}\inp s\ttm{)))}$ would come out as \begin{alltt} <sequence> \textit{\ldots steps \ldots} <dataflow> <value> \textit{Representation of e} </value> <to-step direction="arg" param="p"> \textit{Representation of s} </to-step> </dataflow> </sequence> \end{alltt} \section{Comments, conclusions, future directions} \label{sec:conclusions} I have a nagging feeling that I've produced something that's too close to an ordinary programming language, and lost some of the original DAML-S aim of making simple/atomic processes intrinsically refer to communication with outside agents. In this version, if you want to send or receive a message, you just call a primitive action that sends it or receives it. The notion of ``channel'' involves purely internal communication channels. To handle true interactions with other agents, we need to introduce a type \textit{port} that sends and receives values to the outside world. Part of the problem is that DAML-S has never been quite clear on the distinction between ports and channels. That's why there have been so many debates on questions like: Who is responsible for supplying inputs to process steps? A related perennial question is, Who is responsible for making sure a precondition of a step is true? In this version, it should be obvious that the process interpreter, or some more sophisticated planner, is responsible. We said in section~\ref{sec:intro} that one of our goals was to produce crisp answers to questions such as ``Does a process step proceed if some of its inputs are unavailable?'' The answer is ``Yes, but No.'' Theoretically a process interpreter is allowed to be clairvoyant. In executing a statement $\ttm{(with\_dataflow\ } p \ v \ttm{=>}c\ttm{)}$, if it guesses the value of $v$ correctly then it can put it in $c$ even before it finds out the actual value. Here's an example: \begin{alltt} (tag\_scope (produce consume) (parallel (sequence (gen1) (tag produce (gen2 :results (yummie)))) (sequence (tag consume (eat1 goodie <= (yummie(\(\outp\)produce)))) (eat2)))) \end{alltt} If the interpreter gets to the step \texttt{consume} before the step \texttt{produce}, it is allowed to predict the value of \texttt{yummie} and proceed. If this value is a boolean, and in most situations it's \texttt{true} 999 times out of a 1000, the interpreter might just pass \texttt{true} as the \texttt{goodie} argument to \texttt{eat1}. However, if an interpreter never resorts to such sophisticated maneuvers, then what it will do when it encounters a step with an unavailable argument is wait for it to become available. Deadlocks can arise easily, simply by making two steps produce inputs for each other. Another interesting question: ``Can a channel be set more than once?'' The answer is No. The reason, as explained in section~\ref{sec:clarify}, is that channels never actually get set. Situation sequences in which a channel has the wrong value just get weeded out. If you discard all the scenarios in which the channel doesn't have value $x_1$, and all those in which it doesn't have value $x_2\neq x_1$, you will have discarded all scenarios, and be left with a meaningless process. Here's a little exercise for your (by now well honed) intuitions: What does this expression denote: \begin{alltt} (tag\_scope cycle (sequence (tag cycle (with\_params :args (x) :results (y) (no-op) x+1 => y x <= y)))) \end{alltt} Answer: Nothing, for a variety of reasons. One thing it does \emph{not} denote is an infinite loop that keeps making \texttt{x} and \texttt{y} bigger. There is a lot left to do. We haven't discussed conditional effects and values. They should not be hard to incorporate. DAML-S actually has no \texttt{parallel} construct; instead it has \texttt{split}, \texttt{unordered}, and \texttt{split+join}. The differences between these are obscure, so a vanilla \texttt{parallel} was put in their place. Someone who understands the semantics of the original constructs should have a go at defining them formally. After section~\ref{sec:surface} we neglect defining and naming new processes. One reason is that we haven't given any thought to the semantics of recursive processes. Presumably we just need to wrap a big fixed-point operator around everything, but the matter needs a bit of thought. Iteration was left out of consideration. The only reason for this omission is that it would require generalizing channels a bit. A loop requires the idea of an \emph{accumulator}. At most once per iteration a value is sent to the accumulator, and combined with the value that's already there. Probably the best way to model accumulators is as channels that contain a history list of the values accumulated to date. That way, we can use the same tricks as before to avoid side effects on channels. If we actually want side effects on channels, the best way to do it is probably to introduce an artificial class of ``assignment events'' that bring the side effect about. We have to decide exactly at what time such an event occurs, and the answer is probably ``Right before each step that consumes a channel's contents.'' \end{document}
Attachments
- application/postscript attachment: formal-sem.ps
Received on Sunday, 7 September 2003 12:51:05 UTC