Revised DAML-S-PAI syntax & semantics from Drew McDermott on 2003-09-09 (www-ws@w3.org from September 2003)

From: Drew McDermott <drew.mcdermott@yale.edu>
Date: Tue, 9 Sep 2003 00:27:20 -0400 (EDT)
To: www-ws@w3.org
Message-Id: <200309090427.h894RK417440@pantheon-po01.its.yale.edu>
This version supersedes the version I posted yesterday.  This one is a
lot cleaner.

-- 
                                             -- Drew McDermott
                                                Yale Computer Science Department


\documentclass[11pt]{article}
\usepackage{helvet}
\usepackage{alltt}

\def\la{\langle}
\def\ra{\rangle}
\def\inp{\ensuremath\downarrow\!}
\def\outp{\ensuremath\uparrow\!}
\def\inoutp{\ensuremath\downarrow\!\uparrow\!}
\def\thisproc{\texttt{@}}
\def\env{\mbox{\it env}}
\def\omap{\mbox{\it outmap}}
\def\muni{\,\backslash\! +}
\newcommand\itm[1]{\mbox{\textit{#1}}}
\newcommand\rmm[1]{\mbox{\textrm{#1}}}
\newcommand\ttm[1]{\mbox{\texttt{#1}}}

\def\noteme#1{}%{}
\def\notecoauth#1{[[#1]]}
\def\notereader#1{[[#1]]}


\begin{document}

\begin{center}
Surface Syntax and Formal Semantics for DAML-S-PAI \\
**  DRAFT 0.2 ** \\
Drew McDermott \notereader{and the DAML-S coalition, if they don't expel me} \\
September 8, 2003 \\
Copyright 2003 by the DAML-S Coalition and its vast team of lawyers \\
\end{center}

\section{Goals}
\label{sec:intro}

This is a proposal for a surface syntax and formal semantics of a
``processes as instances'' (PAI) notation for DAML-S.  The idea is to
describe a process as an object, typically consisting of pieces which
are themselves processes.  

The goals of this exercise are to
\begin{enumerate}
\item Provide readable surface syntax \ldots

\item \ldots That is easily translatable to RDF;

\item Provide precise answers to questions such as ``Does a process
  step proceed if some of its inputs are unavailable?''
\end{enumerate}


\section{Syntax and Informal Semantics}

The syntax we propose is somewhat Lisp-based, but not entirely.  The
main reason to go this route is that Lisp's concrete syntax is
essentially isomorphic to its abstract syntax, so you can view this as
an abstract description of something with more infix operators if you like.

\subsection{Informal Syntax}
\label{sec:surface}

The key concept in DAML-S is the \emph{process}, which is an
activity carried out by an agent, typically a web service or a client.

A \emph{process definition} is a description of a process.  Processes
come in several flavors, atomic, simple, and various sorts of
composite, distinguished by their control constructs (conditional,
choice, parallel, loop, etc.).

We will assume that a process starts with its control construct, or the
reserved words {\tt atomic} or {\tt simple}.  So an if-then-else might
look like
\begin{center}
\texttt{(if\_then\_else \ldots)}
\end{center}

Every process can have input and output parameters, described using fields
\texttt{:args} and \texttt{:results}.  Input and output parameters
may have optional types, the OWL classes they belong to.  So a simple
sequence might be described 
thus:
\begin{alltt}
    (sequence :args (a - Integer) 
       \textit{---steps---}
       :results (b - String))
\end{alltt}
However, once we get into the formal part of this paper, we will
ignore type declarations in the interest of clarity.

Besides \texttt{:args} and \texttt{:results}, there can be
a \texttt{:locals} declaration.

Processes can also have preconditions, which must be true before the
process can be started:
\begin{alltt}
    (sequence :precondition (exists (x - Credit-card)
                               (and (credit-card-of x customer)
                                    (not (maxed-out x))))
       ...)
\end{alltt}

\notereader{In this paper we neglect conditional outputs and effects,
  although we believe they present no special problems.}

There is a crucial distinction between a process and a \emph{process
occurrence}.  The distinction is obvious in a case like this:

\begin{alltt}
   (sequence
      (toggle\_the\_switch)
      (toggle\_the\_switch))
\end{alltt}

\noindent which contains two occurrences of the process
      \texttt{toggle\_the\_switch}. 
In keeping with RDF style, the description of a process may accompany
one of its occurrences, or may be placed elsewhere.  In the example
above, \texttt{toggle\_the\_switch} must obviously be defined somewhere
else.  Instead, we could have written this:
\begin{alltt}
   (sequence
      (atomic\_process :name toggle\_the\_switch)
      (toggle\_the\_switch))
\end{alltt}
Processes don't have to have names.

If we want to give a name to a process occurrence, we use the
\texttt{tag} construct:
\begin{alltt}
   (tag-scope (tag1 tag2)
      (sequence
         (tag tog1 (simple\_process :name toggle\_the\_switch))
         (tag tog2 (toggle\_the\_switch))))
\end{alltt}
The \texttt{tag-scope} construct is necessarily to indicate the scope
of the names.  One can imagine other ways of doing this (such as
having the scope be the innermost iteration).

We can use \texttt{tag} names to describe dataflows between steps.  
Suppose we have a process for authorizing uses of  credit card.
Because it must communicate with some computer in a central location,
it sometimes times out if traffic to that computer is heavy.  So the
process has three possible outputs: \texttt{authorized},
\texttt{not-authorized}, and \texttt{timeout}.  The process used by a
retailer might be to try the subprocess one or two times and then
give the customer the benefit of the doubt:
\begin{alltt}
   (def\_type CC\_check\_res (Constant "authorized"
                                    "not-authorized"
                                    "timeout"))

   (def\_type CC\_acc\_status (Constant "accepted" "not-accepted"))

   (simple\_process :name check\_auth
                   :args (cc - Credit\_card\_data)
                   :results (res - CC\_check\_res))
\end{alltt}

\begin{alltt}
   (sequence :results (final\_res - CC\_acc\_status)
       (check\_auth cc <= cc 
                    res => (ch1res(\(\inp\) check1)))
       (tag check1
          (if\_then\_else :args (ch1res - CC\_check\_res)
             :cond (ch1res = "timeout")
             :then
                (sequence :results (ch2res - CC\_acc\_status)
                   (check\_auth cc <= cc
                               res => (ch2res(\(\inp\) check2)))
                   (tag check2 
                      (if\_then\_else :args (ch2res - CC\_check\_res)
                                   :results (res - CC\_acc\_status)
                         :cond (ch2res = "not-authorized")
                         :then (value "not-accepted" => final\_res)
                         :else (value "accepted" => final\_res))))
             :else
                (if\_then\_else
                   :cond (ch1res = "authorized")
                   :then (value "accepted" => final\_res)
                   :else (value "not-accepted" 
                                   => final\_res)))))
\end{alltt}

\subsection{Formal Syntax}
\label{sec:formsyn}

In this section we introduce a somewhat simpler syntax that is easier to
explain the semantics of.  In the next section we explain how to
derive the simple syntax from the user-friendly syntax.  

\begin{alltt}
<defn> ::= (atomic\_process <identifier>)
         | (process <identifier> <statement>)

<statement> ::= (call <identifier>)
               | (sequence <statement>*)
               | (parallel <statement>*)
               | (choice <statement>*)
               | (if\_then\_else
                    :cond <proposition>
                    :then <statement>
                    [:else <statement>])
               | (with\_precondition <proposition> <statement>)
               | (with\_params [:args (<identifier>*)]
                              [:locals (<identifier>*)]
                              [:results (<identifer>*)]
                    <statement>)
               | (tag\_bind (<tag-binding>) <statement>)
               | (with\_dataflow <statement> <term> => <channel term>)
\end{alltt}
In a \texttt{with\_params} statement we will allow the parameter
declarations to be moved around, often putting the \texttt{:results}
after the \texttt{<statement>}.

\begin{alltt}
<tag-binding> ::= (<identifier> <tag-id>+)

<term> ::= <channel term> | \emph{insert term-language grammar here}

<channel term> ::= <channel-spec> | <identifier>

<channel-spec> ::= (<identifier> <channel-id>)

<channel-id> ::= (<in-or-out> <tag-id>+)

<in-or-out> ::= \(\inp\) | \(\outp\)

<tag-id> ::= <identifier> | <natural number>
\end{alltt}
%%%%                | (split+join <statement>*)
%%%%                | (unordered <statement>*)


%%%% <input-flow> ::= (<identifier> <= <flow designator>)
%%%% <output-flow> ::= (<identifier> => <flow designator>)

Most of the changes modularize mechanisms such as input/output
declarations, input-output flows, and tag bindings.

The first main change is to do away with the \texttt{tag} and
 \texttt{tag\_scope} constructs and
 replace them with \texttt{tag\_bind}.  What was written informally as
\begin{alltt}
   (tag\_scope (foo)
      (sequence \(s_1\) (tag foo \(s_2\)) \(s_3\)))
\end{alltt}

\noindent is now considered syntactic sugar for 

\begin{alltt}
   (tag\_bind ((foo 2))
      (sequence \(s_1\) \(s_2\) \(s_3\))))
\end{alltt}

\noindent The idea is that sequence steps are normally tagged 1, 2,
\ldots.  Symbolic tags can be introduced by explaining the mapping of
the symbols to the built-in tags.  For conditionals the built-in tags
are \texttt{:then} and \texttt{:else}.  For loops (which we are
neglecting complely), the built-in tags are natural numbers, 
corresponding to iterations.

The \texttt{tag\_bind} construct can be used to refer to steps that
are more deeply nested than in the previous example.  For instance,
in
\begin{alltt}
    (tag\_bind ((foo :else 2))
       (if\_then\_else
          :cond C
          :then A
          :else (sequence D E F)))
\end{alltt}
\texttt{foo} tags step \texttt{E}, the second step of the
\texttt{:else} step of the construct.  

The sugared syntax allowed \texttt{:args}, \texttt{:locals}, and
\texttt{:results} to 
be declared for all constructs.  We now provide an explicit
\texttt{with\_params} statement.  Similarly for preconditions.

The last major change is that dataflows are specified in only one
place, using a \texttt{with\_dataflow} statement.  The 
destination for the flow can be specified in one of two ways: as
an identifier (denoting an input or output channel) bound in the scope of the
\texttt{with\_dataflow} statement, or as a designator for a flow into
or out of one of steps of the \texttt{<statement>} the
\texttt{with\_dataflow} modifies.  The identifier \thisproc{} is
reserved for this process itself.  

For example, the flow from the second
occurrence of \texttt{check\_auth} to statement \texttt{check2} would
be moved up to the sequence level, and written:
\begin{alltt}
    (tag\_bind ((check2 2))
       (with\_params :results (ch2res - CC\_acc\_status)
          (with\_dataflow
             (sequence 
                (call :name check\_auth)
                (with\_params :args (ch2res - CC\_check\_res)
                             :results (res - CC\_acc\_status)
                   (if\_then\_else
                      ...)))
          (cc (\(\outp\)1)) => (ch2res (\(\inp\)check2)))))
\end{alltt}
(The flow from into the \texttt{cc} input of the \texttt{call} is left
as an exercise for the reader.)  

Other, lesser, changes are:
\begin{enumerate}
\item To require calls of defined processes to be labeled explicitly
  using \texttt{(call \textit{proc} \ldots)}.

\item To require parentheses around \texttt{<input flow>}s and
  \texttt{<output flow>}s.
\end{enumerate}

One thing that is absent from the formal treatment is any notion of
typing of variables.  That absence can't be made up by any kind of
sugaring.  Eliminating types makes the formal semantics simpler, so
from now on forget that variables might have types.

\subsection{Informal Syntax as Syntactic Sugar for the Formal Kind}

Procedure for ``desugaring'' the concise notation in order to arrive
at its equivalent in the formal syntax:

\begin{enumerate}
\item Replace every \texttt{tag\_scope} with the equivalent
  \texttt{tag\_bind}, and replace $\ttm{(tag\ } g\ e\ttm{)}$ with $e$;

\item If a statement contains an \texttt{:args}, \texttt{:locals}, or
  \texttt{:results} declaration, tranform it into a
  \texttt{with\_params} statement, with the parameter declarations
  pulled out in the obvious way.

\item Every dataflow expression of the form $c \ttm{<=} v$ or
  $v\ttm{=>}c$ is deleted.  A \texttt{with\_dataflow} expression is
  created and wrapped around the narrowest scope in which all the
  identifiers in $c$ and $v$ are bound.  (This includes the
  identifiers bound in  \texttt{tag\_bind}s.)
\end{enumerate}

\section{Formal Semantics}

\subsection{Technicalities}

To explain the formal semantics, we need to talk about what the
universe looks like.  It certainly includes a set of objects $U$, and
in a full treatment $U$ ought to be broken down into classes
corresponding to the types  of the variables, but, as explained in
section~\ref{sec:formsyn}, 
we are simplifying variable types away.

In addition to objects, we also need a set of  \emph{situations}.
Classically these 
are taken to be snapshots of the world, that is, assignments of truth
values to all (timeless) propositions.  We adopt that position, with
the twist that situations are not defined by their truth assignments;
two distinct situations can assign exactly the same values to all
propositions.  We assume there is a
partial order on situations.  $\sigma_1 \preceq \sigma_2$ means that
$\sigma_1$ is ``in $\sigma_2$'s past.''  Whether there is a unique
future for any situation, or time \emph{branches} into the future, is
an issue that needn't concern us here.\footnote{The Reiter School
  treats 
situations as sequences of actions (i.e., choices made by an
agent).  That approach could probably be made to work for us, but for
now we'll just be neutral about exactly what situations are
metaphysically.} 
%%%% We use the symbol $\Sigma$ to refer to the set of all situations.

To handle dataflow, we need a few extra object types.  An input or
output will be bound to an object of 
type \emph{channel}. 
A channel is essentially a
storage location.  
%%%% A channel can hold a value  from $U$, or a special
%%%% value $\bullet$, not an element of $U$, which means ``unassigned.''
A \emph{channel map} is a set of ordered pairs $\{\la \itm{ch},
\itm{val} \ra\}$, where $\itm{val}\in U$.
%%%% \cup \{\bullet\}
A channel
map specifies the current contents of a set of channels.  A channel
with no value does not appear in the channel map, so $\itm{channel
  map}(c) = \perp$, the ``undefined value.''
We assume the existence
of an infinite number of channels, so we can always find one we
haven't used yet.  

The denotation of a DAML-S process expression is (modulo
technicalities) a set of pairs $\la
x_0,x_1\ra$, where $x_0$ is a ``starting point'' and $x_1$ is a
``finishing point,'' the intuition being that one way for the process
to proceed starting in $x_0$ is to wind up in $x_1$.  $x_0$ and $x_1$
are \emph{valuated situations}, triples of the form $\la p,c,s \ra$,
where $s$ is a situation, $c$ is a channel map, and $p$ is a
parameter map.  $s$ represents the state of the outside
world, $c$ represents the contents of channels, and $p$ associates 
arg/results parameters of the process with channels.  Note that $c$ is a
``global'' channel map, and can mention channels other than those
mentioned in $p$.  
To refer to the pieces of a valuated situation, we use
$\itm{pm},\itm{cm}, \itm{sit}$, so that valuated situated $s =
\la\itm{pm}(s), \itm{cm}(s), \itm{sit}(s)\ra$.

A pair of valuated situations $\itm{vs}_1,\itm{vs}_2$, where
$\itm{sit}(\itm{vs}_1)\preceq \itm{sit}(\itm{vs}_2)$,
is called a
\emph{valuated situation interval}, or VSI.
So the denotation of a process is (almost) a set of valuated situation
intervals, or, equivalently, a relation on valuated situations.
$\la \itm{vs}_1,\itm{vs}_2\ra$ is in the set if $\itm{vs}_2$ is one
place the process can finish given that it starts in
$\itm{vs}_1$.  We can refer to the pieces $\itm{vs}_1,\itm{vs}_2$ of a
VSI $\itm{vsi}$ as
$\itm{begin}(\itm{vsi}),\itm{end}(\itm{vsi})$.\footnote{
One way to think about nondeterminism would be
  to allow a process to end in more than one valuated situation.
But there other ways to think about branching time, and we don't want
to commit to a particular scheme.}

The presence of tags compels us to impose some structure on the VSI a
process term denotes.  A tag refers to a
piece of a process execution trace.  So we introduce the notion of a
\textit{tag tree}, which records the bindings of tags to valuated
situation intervals.  Each node of the tree is labeled with a VSI, and
has a finite 
number of subtrees, the links to which are labeled with 
tags.  If $m$ is such a tree, then $\itm{piece}(m, t_1\, t_2\, \ldots 
\,t_k)$, where each $t_i$ is a \texttt{<tag-id>}, is the VSI obtained by
getting piece $t_1$ of $m$, then piece $t_2$ of that subtree, and so forth.
$\itm{piece}(m)$ is the VSI associated with the whole tag tree.
For clarity, one can also write this as $\thisproc m$.
The labels below the top node of a tag tree $m$ are denoted 
by $\itm{labels}(m)$. 
If the sequence of tag-ids contains a tag corresponding to no subtree,
the value of the \textit{piece} expression is $\perp$.
The symbol $\emptyset$ is used for a null tree, associated with no
VSIs.  The notation $m+ t \mapsto m'$ means the tree $m$ augmented
with a new branch labeled $t$ that takes you to tag tree $m'$, which
thereby becomes a subtree of the new tree.
%%%% A single VSI $v$ stands for 
%%%% the tag tree whose sole node is labeled $v$.
A leaf tag tree whose sole node is labeled $v$ is written $\emptyset
 + \thisproc \mapsto v$.  If there is already a subtree or VSI
 associated with a label $l$ in $m$, then $m+ l\mapsto x$ is $m$, but
 with $\itm{piece}(m,l)$ reset to $x$.

A key requirement for a useful tag tree is that the VSIs labeling its nodes must 
be \emph{comparable.}  That is, if $\la s_1,s_2\ra$ is a VSI  from somewhere in a
tag tree and $\la s_3, s_4\ra$ is a VSI from somewhere else, then the
four points are ordered by $\preceq$ somehow; that is, there is a
permutation $i_1,i_2,i_3,i_4$ of $1,2,3,4$ so that $\itm{sit}(s_{i_1})
\preceq \itm{sit}(s_{i_2}) \preceq \itm{sit}(s_{i_3}) \preceq \itm{sit}(s_{i_4})$.
A set of tag trees may describe alternative timelines (or time
branches, or possible
worlds, \ldots), but each tag tree is drawn from a single timeline.

Another important requirement is that a tag tree be \emph{inertial},
which means ``internally'' and ``externally'' inertial.

A set of valuated situation intervals is \emph{internally inertial} if
it has the following  two properties:
\begin{enumerate}
\item \emph{Parsimony:} No channel appears in a channel map unless some parameter is or
  was bound to it at some point.

\item \emph{Conservation:} Nothing changes the contents of channels except
      dataflows to the input parameters denoting them.
\end{enumerate} 
Both properties are expressed as constraints on consecutive VSIs, defined
thus: $\itm{vsi}_1$ and
$\itm{vsi}_2$ are \emph{consecutive} in a set of VSIs $S$ if and only if
\begin{tabbing}
\hspace{2ex}\=$\itm{vsi}_1\in S \rmm{\ and\ } \itm{vsi}_2 \in S$ \+\\
      and $\itm{sit}(\itm{end}(\itm{vsi}_1)) \prec \itm{sit}(\itm{begin}(\itm{vsi}_2)) $\\
      $\rmm{and there }$\=$\rmm{is no\ } \itm{vsi'} \in S \rmm{\ such that\ } $\+\\
                        $\itm{sit}(\itm{end}(\itm{vsi}_1)) 
                        \prec \itm{sit}(\itm{end}(\itm{vsi}')) 
                        \prec \itm{sit}(\itm{begin}(\itm{vsi}_2))$ \-\-\\
\end{tabbing}
One might wonder about the possibility of there being an infinite
      sequence of situations between $\itm{vsi}_1$ and $\itm{vsi}_2$.
      In this document, we consider only finite tag trees, so the
      issue won't come up, but when the formalism is extended to
      iterations, we will certainly need an ``anti-Zeno'' rule to
      prevent such pathological cases from arising.  

Parsimony is then  formalized as follows: If $c$ is a channel in
$\itm{cm}(\itm{begin}(v))$ for some VSI $v$ in our set, then
either $c$ appears in $\itm{cm}(\itm{end}(v'))$ for some $v'$ such
that $v'$ and $v$ are consecutive, or there is a parameter $p$ such
that $(\itm{pm}(\itm{begin}(v)))(p) = c$.  

Conservation is expressed thus:
If a set of VSIs contains two consecutive elements $\itm{vsi}_1$ and 
$\itm{vsi}_2$, then 
$\itm{cm}(\itm{begin}(\itm{vsi}_2))$ differs from
$\itm{cm}(\itm{end}(\itm{vsi}_1))$ only on channels denoted by input
parameters to $\itm{vsi}_2$.  More technically, 
\begin{tabbing}
\hspace{2ex}\=If \=$\itm{vsi}_1 \rmm{\ and\ } \itm{vsi}_2 \rmm{\ are consecutive}$ \+\+\\
            $\rmm{then if }$\=$\rmm{there is a }$\=$\rmm{channel\ } c \rmm{\ such that\ } $ \+\+\\
                                               $(\itm{cm}(\itm{end}(\itm{vsi}_1))(c)
                                               \neq (\itm{cm}(\itm{begin}(\itm{vsi}_2))(c) $\-\\
                          $\rmm{then there is a parameter\ } p \rmm{\ such that\ }
                               (\itm{pm}(\itm{begin}(\itm{vsi}_2)))(p) = c
$\end{tabbing}

A set of VSIs is \emph{externally inertial} if, whenever 
two of its
elements, $\itm{vsi}_1$ and $\itm{vsi}_2$, are consecutive, then the 
only changes that occur between
$\itm{sit}(\itm{end}(\itm{vsi}_1))$ and
$\itm{sit}(\itm{begin}(\itm{vsi}_2))$ are those allowed by the
``physics'' of the world.  In the present paper we will say nothing
about physics, except to note that if the world obeyed the laws of
``classical planning'' [<bibref>] then \emph{no} changes would occur
between the end of $\itm{vsi}_1$ and the beginning of $\itm{vsi}_2$,
and the two situations would assign the same truth values to all propositions.  
Of course, the world of web services is  definitely nonclassical, and
the physics could get very intricate, what with autonomous processes,
randomness, and the presence of other agents, some more cooperative
than others.

Having defined all these concepts in terms of sets of
      VSIs, we extend them to a tag tree by applying them to the set of
all VSIs labeling nodes of the tree.

The meanings of expressions are always with respect to a
variable-binding environment, as is traditional.
An \emph{environment} is a mapping from variables to values.
``Variables'' are of two kinds: 
identifiers and \emph{piece designators}, defined as lists of 
\texttt{<tag-id>}'s as defined in section~\ref{sec:formsyn}.
Values
are  from the set  $U \cup \itm{channels} \cup \itm{VSIs} \cup
\emph{piece designators}$.  
(Note that piece designators can serve as both variables and values;
when a piece designator is the ``variable,'' the value will always be
a VSI; when it's the value, the variable will always be an identifier,
in fact, a tag.)

We single out two special cases of environments:
A \emph{parameter map}
contains only bindings of  arg/result parameters to channels.  
A \emph{tag map}'s bindings whose variables are piece designators and
whose values are VSIs.

The formal semantics makes use in a few places of sets of
ordered pairs, a.k.a functions, so we introduce
some common terminology 
to talk about them.  Typically our functions are finite, hence
partial.  So $F(x)$ is either the $y$ corresponding to $x$ in the list
of ordered pairs, or $\perp$ if there's no such pair.  We often want to
avoid $\perp$, because it indicates that something meaningless or
impossible has occurred; $F(\perp) = \perp$ \noteme{almost always}, so 
$\perp$ will propagate out to make a given
syntactically correct DAML-S expression meaningless.
If $F$ is a function,  then $F\muni F'$
means $F \backslash \mbox{\it overlap}(F,F') \cup F'$, where
\textit{overlap}$(F,F')$ is the set of elements of $F$ whose
arguments are in the domain of $F'$.  In other words, $F\muni F'$
is $F \cup F'$ with bindings from $F'$ replacing those in $F$ that
give different results for a given argument: $(F \muni F')(x) =
\itm{\ if\ } x\in \itm{dom}(F') \itm{\ then\ } F'(x) \itm{\ else\ }
F(x)$. 

%%%% Another way to combine functions is the operator $F_1\uplus F_2$.
%%%% This is equivalent to $F_1\cup F_2$, if for all $x \in \itm{dom}(F_1) \cap
%%%% \itm{dom}(F_2)$, $F_1(x) = F_2(x)$; i.e., $F_1$ and $F_2$ on all
%%%% points they share in common.
%%%% Otherwise, $F_1\uplus F_2 = \perp$.

There one more gimmick we need here
for technical reasons.
%%%% , in addition to the aforementioned $\perp$ and
%%%% $\bullet$.  
We use the symbol ``$\diamond$'' to refer to the ``nowhere
channel,'' a mathematical \texttt{/dev/null}.  It's not an error to
write to this channel, but it is an error to try to read from it.

\subsection{The Semantics}

We are finally ready to proceed.  We will define the function ${\cal
M}(e,\env)$, which gives the meaning of expression $e$ with respect to
environment \env.  Because some expressions denote
channels, every expression can be thought of as having a c-value and a
d-value, which correspond to the ``l-value'' and ``r-value'' from
programming-language semantics.  So ${\cal M}(e,\env) = \la {\cal
  C}(e,\env), {\cal D}(e,\env)\ra$.  
$\cal D$ gives the \emph{d-value} (denotation) of an expression, and
$\cal C$ gives its \emph{c-value},
the channel, if any, that the d-value came from.  
For many expressions, such as \texttt{5} and \texttt{x+y}, the c-value
is undefined, so it will be $\perp$ or $\diamond$, as appropriate.  In
cases where it is $\perp$, we will give the value of $\cal D$ for an
expression and not bother with spelling out $\cal C$ explicitly.
Our main focus in this paper is the denotations of process
expressions \textit{exp}, so what we are ultimately interested in is
the set of valuated situation intervals
$$\itm{image}(R, {\cal D}(\itm{exp},\emptyset,E_0))$$
where $E_0$ is the
built-in or ``basis'' environment; $\itm{image}(f,S)$
means $\{f(x) | x \in S\}$; and $R(m)= \thisproc m$, the VSI labeling
the root node of $m$.  

A full treatment of the semantics would specify the meanings of all
expressions, including the long-awaited formalism for propositions
(conditions and effects).  But we're going to focus here on the
process sublanguage, so the long-awaited formalism will have to be
awaited a little longer. \notereader{Actually, I expect no substantive
  problems to arise in extending $\cal M$ to a logical language.  In
  particular, it should be possible to refer to the entities, such as
  channels,  bound in a process, simply by using their names.}

We start by giving the semantics of channel ids, of the form
$\ttm{(}\inp t_1 \ldots t_k\ttm{)}$ or $\ttm{(}\outp t_1 \ldots
t_k\ttm{)}$.  Recall that an environment can bind \emph{piece
  designators} to VSIs, where a piece designator is a list $\ttm{(}t_1
\ldots t_k\ttm{)}$, each $t_i$ being either an identifier or a natural
number.  Define the \emph{expansion} of a piece designator, written
$\itm{expansion}(\ttm{(} t_1 \ldots t_k\ttm{)}, \env)$, to be the
piece designator obtained by eliminating all bound identifiers from
the $t_j$.  That is, if $t_j$ is an identifier, and $\env(t_j)$ is a
piece designator $\ttm{(}t'_1 \ldots t'_{k'}\ttm{)}$, then we replace $t_j$ in the
list with its value, getting $\ttm{(}t_1 \ldots t_{j-1}\ t'_1 \ldots
t'_{k'}\  t_{j+1} \ldots t_k\ttm{)}$.  This process is repeated until
the only identifiers remaining are built-in tags such as
\texttt{:then}. 

Now we can formalize the semantics of channel ids thus:

\begin{tabbing}
\hspace{2ex}  \= ${\cal M}(\itm{id}(\inp t_1 \ldots t_k), \env)
          = H(\itm{begin}(\env(\itm{expansion}(\ttm{(}t_1 \ldots t_k\ttm{)}, \env)), \itm{id}))$ \+\\
  \= ${\cal M}(\itm{id}(\outp t_1 \ldots t_k), \env)
          = H(\itm{end}(\env(\itm{expansion}(\ttm{(}t_1 \ldots t_k\ttm{)}, \env)), \itm{id}))$\-\\
where $H(\itm{vs}, \itm{id}) = \rmm{\ If\ }$\= $\itm{vs}=\perp 
                                         \rmm{\ then\ } \la \diamond,\perp \ra $ \+\\
                                $\rmm{else\ Let\ }$\=$c = (\itm{pm}(\itm{vs}))(\itm{id})$ \+\\
                                $\rmm{in if\ } $\= $c \rmm{\ is a channel then \ } 
\la c, (\itm{cm}(\itm{vs}))(c)\ra$ \+\\
                                           $\rmm{else \ } \la \diamond, \perp \ra$ \-\-\-\\
If $\env(\itm{id})$ is a channel $c$, ${\cal M}(\itm{id}, \env) =
          \la c, (\itm{cm}(\itm{end}(\thisproc m)))(c)\ra$ \\
For all other $e$, \+\\
  ${\cal C}(e, \env) = \perp$
\end{tabbing}
so that we can define ${\cal M}(e,\env)$ just by specifying ${\cal D}(e,\env)$.

We construct an interpretation of a set of composite processes in the
usual way.  We start with an assignment of a set of VSIs 
to each atomic-process name.  We'll just assume that $E_0$, the
initial environment, includes these assignments.

The definition of $\cal D$ is a pile of special cases, one for each sort of
composite process.  

\begin{tabbing}
%%% named
   \=   If $a$ is a named process (especially an atomic one), then  \\
   ${\cal D}(\mbox{\tt (call\ } a\mbox{\tt)},  \env) = \env(a) 
          = \{ \emptyset + \thisproc \mapsto v | v \in \env(a)\}$ \+\\
\\
%%% sequence
   ${\cal D}($\texttt{(seq}\=\texttt{uence $p_1$ $p_2$ $\ldots$ $p_n$)},   $\env)$ \+\\
        $=\{\mbox{\rm tag }$\= $\mbox{\rm tree\ } m |\,\mbox{\it labels}(m)=\{1, \ldots,n\}$ \textrm{and}\+\\
                    $\mbox{\rm for some\ }$\=$v_1, \ldots, v_n, \rmm{\
   where\ } v_i\in {\cal D}(p_i,\env)$ \+\\
                                        \=$\mbox{\rm for all\ } j, 1 \leq j \leq n, \itm{piece}(m,j) = v_j$ \\
                                        $\mbox{\rm and for all\ } j, 1 \leq j < n, \itm{sit}(\mbox{\it end}(v_j)) \preceq \itm{sit}(\mbox{\it begin}(v_{j+1}))$ \\
                $\rmm{and\ } v_1,\ldots,v_n \rmm{\ are an inertial sequence}$ \\
                           $\mbox{\rm and\ }$\= 
                             $\thisproc m = \la $\=$ \itm{begin}(v_1), \itm{end}(v_n))\}$\-\-\-\\
\end{tabbing}
Note that we don't have to mention comparability of the $v_i$ because
it's implied by the requirement that they are linearly ordered.

\begin{tabbing}
%%% parallel
${\cal D}(\ttm{(par}$\=$\ttm{allel\ } s_1 \ldots s_k\ttm{)},  \env)$ \+\\
                  $= \rmm{\ Let\ }$\= $M = \{$\=$\emptyset + (1 \mapsto v_1) + \ldots + (k \mapsto v_k) $\+\+\\
                                         $| \rmm{\ so}$\=$\rmm{me\ } v_1 \in {\cal D}(s_1,\env), \ldots, v_k \in {\cal D}(s_k,  \env) 
\}$ \+\\
                                        $\rmm{such that the\ } v_i \rmm{\ are comparable and inertial}\}$ \-\-\\
     in $\itm{image}(\itm{combine}, M) $ \\
     where $\itm{com}$\=$\itm{bine}(m)$ \+\\
                       
                       $ = \{$\=$m + \thisproc \mapsto \la v_b, v_e \ra $\+\\
                                         $| \ $\=$\rmm{Let\ } S_b = \{v | $\= $v= \itm{begin}(\itm{piece}(m,i)) $ \+\+\\
                                                             $ \rmm{\ for some\ } i \in [1,k]\}$ \-\\
                                            $\rmm{and\ } S_e = \{v | $\= $v= \itm{end}(\itm{piece}(m,i)) $ \+\\
                                                                     $\rmm{\ for some\ } i \in [1,k]\}$ \-\\
                                            $\rmm{in\ } v_b = \itm{\ earliest}(S_b) \rmm{\ and\ } v_e = \itm{\ latest}(S_e)\}$ \-\-\-\-\-\\
\\
%%% choice
${\cal D}(\ttm{(cho}$\=$\ttm{ice\ } c_1, \ldots, c_k\ttm{)}, \env) $ \+\\
                   $= \cup_{j=1}^k {\cal D}(c_j, \env)$ \-\\
\\
%%% if_then_else
   ${\cal D}($\texttt{(if\_}\=\texttt{then\_else } \texttt{:cond} $t$ \texttt{\ :then } $Y$ \texttt{:else } $N$\texttt{)}$,  \env)$ \+\\
                            $= $\=$C(\ttm{true}, Y, \ttm{iftrue}) 
                                   \cup C(\ttm{false}, N, \ttm{iffalse}) $\+\\
                            where $C(\tau, e, l) = \{$\=$\emptyset + (\thisproc \mapsto m) + (l \mapsto m)$ \+\\
                                                        $|$   for \=some $m\in {\cal D}(e,\env)$ \+\\
                                                                  $\rmm{such that\ } {\cal D}(t, \env) = \tau\}$ \-\-\-\-\\
\\
${\cal D}(\ttm{(wit}$\=$\ttm{h\_precondition\ } p\ s\ttm{)},\env) $ \+\\
                  $=\{m \in {\cal D}(p,\env)\, |\, p \rmm{\ is true in\ } \itm{sit}(\itm{begin}(\thisproc m)) \}$ \-\\
\end{tabbing}

That concludes our definitions of the basic control constructs.  Now
we handle the various notations for binding parameters and passing
values to them.

\begin{tabbing}
%%% with_params
${\cal D}$\=$(\ttm{(with\_params\ } q\  \ttm{:args\ } A \ttm{\ :locals\ } V \ttm{\ :results\ } R\ttm{)},  \env)$ \+\\
         $=$\ Let \=$E_A = \{\la a,c\ra | a \in A \mbox{\it\ and\ } c \mbox{\it\ a new channel}\}$ \+\\
                    $E_V = \{\la v,c\ra | v \in V \mbox{\it\ and\ } c \mbox{\it\ a new channel}\}$ \\
                    $E_R = \{\la r,c\ra | r \in R \mbox{\it\ and\ } c \mbox{\it\ a new channel}\}$ \-\\
                            then let\ \= $M = {\cal D}(p, \env\muni E_P \muni E_V \muni E_R$) \+\\
                                 in \=$\itm{image}(A,M)$ \+\\
                                    whe\=re for all $m\in M$ \+\\
                                       $A(m) = $\=$\ m + \thisproc \mapsto \la \itm{vs}_b, \itm{vs}_e \ra$  \+\\
                                                $\rmm{where\ }$\=$\rmm{let\ } \la\itm{vs}'_b,\itm{vs}'_e\ra = \thisproc m $\+\\
                                       \=$\rmm{in\ }\itm{vs}_b= \la \itm{pm}(\itm{vs}'_b) \muni E_P, \itm{cm}(\itm{vs}'_b), \itm{sit}(\itm{vs}'_b)\ra $ \\
                                      $\rmm{and\ }\itm{vs}_e = \la \itm{pm}(\itm{vs}'_e) \muni E_R, \itm{cm}(\itm{vs}'_e), \itm{sit}(\itm{vs}'_e)\ra $ \-\-\-\-\-\-\\
\\
%%% with_dataflow
${\cal D}(\ttm{(with\_}$\=$\ttm{dataflow\ } p\  r \ttm{=>} c\ttm{)},  \env)$ \+\\
                $= \{$\=$m\in{\cal D}(p,\env)$ \+\\
                     $|\ $\=${\cal M}(c,\env) = \la x, v_c\ra, x\neq
                \perp, \rmm{\ and\ } {\cal D}(r,\env) = v_r \neq \perp$ \+\\
                         $\rmm{\ and\ } v_r = v_c\}$ \-\-\-\\
\\
%%% tag_bind
${\cal D}(\texttt{(tag\_}$\=$\ttm{bind((}\itm{id}, t_1\ldots t_k\ttm{))}\  P\ttm{)},  \env)$ \+\\
    $= {\cal D}$\=$(P, \env \muni \itm{tm})$ \+\\
               where $\itm{tm} \rmm{\ is the tag map\ } \{\la \itm{id}, \itm{expansion}(\ttm{(}\itm{id}, t_1\ldots t_k\ttm{)}, \env) \ra \}$ \-\-\\
\end{tabbing}

\subsection{Some clarifying remarks}
\label{sec:clarify}

In case the meaning of all this isn't intuitively obvious, let us make a
few remarks.  

A key feature of the the \texttt{with\_params} construct is that it
binds channel identifiers in two places: the environment, and the
parameter maps for the process.  That allows us to refer to them
in two different ways.  Within the process, we can refer to them by
name, because they're bound in the environment.  After leaving
the process, but within the scope of a \texttt{with\_dataflow}
wrapping it, we can
refer to the same channels by their associations with the parameter maps for
the process's steps.  E.g., in
\begin{alltt}
   (with\_params
      :args (w) 
      (with\_dataflow
          (with\_dataflow
              (sequence (call foo)
                        (with\_params 
                            :args (x) 
                            (call baz) 
                            :results (y))
                        (with\_params 
                            :args (z)
                            (call shazam)))
              w => (x(\(\inp\)2)))
          (y(\(\outp\)2)) => (z(\(\inp\)3))))
\end{alltt}
we send the d-value of \texttt{w} to the channel \texttt{x} of (i.e.,
bound around the call 
to) \texttt{baz}, and we send the contents of the channel \texttt{y}
of \texttt{baz} to the input channel \texttt{z} of \texttt{shazam}.  
The d-value of 
\texttt{y(\(\outp\)2)} is obtained in two stages: letting $\itm{evs}$
be $\itm{end}$ valuated situation of step 2,
we first retrieve 
the channel for
\texttt{y} from the parameter map of \textit{evs}, then we retrieve the
contents of that channel from the channel map of \textit{evs}.
Similarly, letting \textit{bvs} be the \textit{begin} valuated
situation of step 3, 
we get the  \texttt{z} channel from the parameter map of \textit{bvs}
and we store the new association in the channel map of \textit{bvs}

Actually, it may not be clear how anything gets ``stored.''  The
definition of \textit{with\_dataflow} simply requires that the d-value
of one expression be found in the c-value of another, as if by magic.
What actually happens is that the semantics assigns a large number of
possible denotations to a process term.  In some of them, the c-value
(a channel) happens to contain the correct value, and the others are
weeded out.

Nothing in the example tells us how the channel for \texttt{y}
gets set or what to;
that presumably occurs inside step 2.  If it never does get set, then
the channel map will give us the value $\perp$ for the channel.
According to the definition of $\ttm{with\_dataflow}$, a tag tree with
this feature will not pass muster.  

Another tricky bit is that we have to make sure we always use the
right channel map to retrieve parameter values from.  Typically it's
the latest one available.  Internal inertiality guarantees that 
channel contents accumulate as time passes, except for dataflows.


\section{Relationship to ``Deep'' Syntax (RDF)}

\notereader{The following is only suggestive.  In future rewrites it
  will be tied to the notation developed by David Martin and Massimo
  Paolucci.} 

The surface syntax we have been working with was designed with an eye
toward transformation into RDF.  A process would be described as an
object with various properties.  E.g. 
$\ttm{(if\_then\_else :cond\ } c \ttm{\ :then\ } a \ttm{\ :else} b\ttm{)}$ 
would get turned
into
\begin{alltt}
     <if_then_else>
        <cond>
          \textit{Representation of c}
        </cond>
        <then>
           \textit{Representation of a}
        </then>
        <else>
           \textit{Representation of b}
        </else>
     </if_then_else>
\end{alltt}
\texttt{:name} declarations would get turned into \texttt{ID}
attributes.  Parameter and precondition declarations would be handled
as subelements of the process elements they belong to.

The representation of dataflows is closer to the formal syntax than
the informal version.  That is, the dataflow is itself an object with
properties (\texttt{from-expression}, 
\texttt{to-param}, \texttt{to-step}), which would be a subelement of
the appropriate process element.  That is, $\ttm{(with\_dataflow\ }
\ttm{(sequence} \ldots \ttm{)}
\ e \ttm{=> (}p \ttm{(}\inp s\ttm{)))}$ would come out as
\begin{alltt}
     <sequence>
        \textit{\ldots steps \ldots}
        <dataflow>
           <value>
               \textit{Representation of e}
           </value>
           <to-step direction="arg" param="p">
               \textit{Representation of s}
           </to-step>
        </dataflow>
     </sequence>
\end{alltt}

\section{Comments, conclusions, future directions}
\label{sec:conclusions}

I have a nagging feeling that I've produced something that's too
close to an ordinary programming language, and lost some of the
original DAML-S aim of making simple/atomic processes 
intrinsically refer to communication with outside agents.  In this
version, if you want to send or receive a message, you just call a
primitive action that sends it or receives it.  
The notion of ``channel'' involves purely internal communication
channels.  To handle true interactions with other agents, we need to
introduce a type \textit{port} that sends and receives values to the
outside world.  

Part of the problem is that DAML-S has never been quite clear on the
distinction between ports and channels.  That's why there have been so
many debates on questions like: Who is responsible for supplying
inputs to process steps?  

A related perennial question is, Who is responsible for making sure a
precondition of a step is true?  In this version, it should be obvious
that the process interpreter, or some more sophisticated planner, is
responsible. 

We said in section~\ref{sec:intro} that one of our goals was to produce
crisp answers to questions such as 
``Does a process
  step proceed if some of its inputs are unavailable?''  The answer
  is ``Yes, but No.''  Theoretically a process interpreter is allowed
  to be clairvoyant.  In executing a statement $\ttm{(with\_dataflow\
  } p \ v \ttm{=>}c\ttm{)}$, if it guesses the value of $v$ correctly
  then it can put it in $c$ even before it finds out the actual
  value.  Here's an example:
\begin{alltt}
    (tag\_scope (produce consume)
       (parallel
          (sequence (gen1) 
                    (tag produce (gen2 :results (yummie))))
          (sequence (tag consume 
                        (eat1 goodie <= (yummie(\(\outp\)produce))))
                    (eat2))))
\end{alltt}
If the interpreter gets to the step \texttt{consume} before the step
\texttt{produce}, it is allowed to predict the value of
\texttt{yummie} and proceed.  If this value is a boolean, and in most
situations it's \texttt{true} 999 times out of a 1000, the interpreter
might just pass \texttt{true} as the \texttt{goodie} argument to
\texttt{eat1}. 

However, if an interpreter never resorts to such sophisticated
maneuvers, then what it will do when it encounters a step with an
unavailable argument is wait for it to become available.  Deadlocks
can arise easily, simply by making two steps produce inputs for each
other.  At the cost of complicating the definition of
\texttt{with\_dataflow} somewhat, we could probably make life easier
by just imposing the
constraint that step $A$ precede step $B$ if there is
a flow from an output $A$ to the input of
$B$.

Another interesting question: ``Can a channel be set more than once?''
The answer is No.  The reason, as explained in
section~\ref{sec:clarify}, is that channels never actually get set.
Situation sequences in which a channel has the wrong value just get
weeded out.  If you discard all the scenarios in which the channel
doesn't have value $x_1$, and all those in which it doesn't have value
$x_2\neq x_1$, you will have discarded all scenarios, and be left with
a meaningless process.

Here's a little exercise for your (by now well honed) intuitions:
What does this expression denote:
\begin{alltt}
           (tag\_scope cycle
              (sequence
                 (tag cycle 
                    (with\_params :args (x) :results (y)
                        (no-op)
                        x+1 => y
                        x <= y))))
\end{alltt}
Answer: Nothing, for a variety of reasons.  One thing it does
\emph{not} denote is an infinite loop that keeps making \texttt{x} and
\texttt{y} bigger. 

There is a lot left to do.  We haven't discussed conditional effects
and values.  They should not be hard to incorporate.

DAML-S actually has no \texttt{parallel} construct; instead it has
\texttt{split}, \texttt{unordered}, and \texttt{split+join}.   The
differences between these are obscure, so a vanilla \texttt{parallel}
was put in their place.  Someone who understands the semantics of the
original constructs should have a go at defining them formally.

After section~\ref{sec:surface} we neglect defining and naming new
processes.  One reason is that we haven't given any thought to the
semantics of recursive processes.  Presumably we just need to wrap a
big fixed-point operator around everything, but the matter needs a bit
of thought.

Iteration was left out of consideration.  The only reason for this
omission is that it would require generalizing channels a bit.  A loop
requires the idea of an \emph{accumulator}.  At most once per
iteration a value is sent to the accumulator, and combined with the
value that's already there.  Probably the best way to model
accumulators is as channels that contain a history list of the values
accumulated to date.  That way, we can use the same tricks as before
to avoid side effects on channels.

If we actually want side effects on channels, the best way to do it is
probably to introduce an artificial class of ``assignment events''
that bring the side effect about.  We have to decide exactly at what
time such an event occurs, and the answer is probably ``Right before
each step that consumes a channel's contents.''  

\noteme{Not clear that \itm{piece} is always used to get a VSI instead
  of a subtree.}

\end{document}
Attachments

application/postscript attachment: formal-sem.ps
Received on Tuesday, 9 September 2003 00:27:41 UTC