Re: A side-effect example from Norman Walsh on 2006-04-26 (public-xml-processing-model-wg@w3.org from April 2006)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Wed, 26 Apr 2006 07:26:46 -0400
To: public-xml-processing-model-wg@w3.org
Message-ID: <87fyk0bmrt.fsf@nwalsh.com>
/ Alex Milowski <alex@milowski.org> was heard to say:
| For me, it is unclear what is the input for each of these
| steps.

They have no inputs. That's the degenerate case of "the inputs are exactly
the same each time".

|> If the Timestamp component has no such annotation (or asserts that it
|> doesn't have side effects) then that could still be the result
|> (implementations aren't required to cache results). But this would be
|> an equally valid result:
|>   <doc>
|>   <t>43151</t>
|>   <t>43151</t>
|>   <t>43151</t>
|>   </doc>
|
| But the output *isn't* a side effect.  It is the output.  You have three
| separate steps, each of which produce some output, and as such, this
| doesn't make sense unless they run at exactly the same time.

That's not necessarily true. Let's try to come at this a different way.

I assert that there are four kinds of components in the world:

1. Side-effect free and functional. Such components have no detectable
   effect on the world except for the inputs they consume and the
   outputs they produce and given the same inputs, they always produce
   the same outputs.

   <p:step name="p:load">
     <!-- loads the given input and returns it -->
     <p:input>
       <inline-document/>
     </p:input>
     <p:output label="constant-doc"/>
   </p:step>

2. Side-effect free and non-functional. Such components have no detectable
   effect on the world except for the inputs they consume and the
   outputs they produce, but given the same inputs, they may produce
   different outputs.

   <p:step name="p:timestamp">
     <!-- takes no input, returns the time of day -->
     <p:output label="time-of-day"/>
   </p:step>

3. With side-effects and functional. Such components may update
   databases, send launch commands to missles controlled by web
   services, or otherwise "scribble on the walls" but given the same
   inputs, they always produce the same outputs.

   <p:step name="p:update-db">
     <!-- adds its input to the database and returns its input -->
     <p:input>
       <inline-document/>
     </p:input>
     <p:output label="output"/>
   </p:step>

4. With side-effects and non-functional. Such components may update
   databases, send launch commands to missles controlled by web
   services, or otherwise "scribble on the walls", and given the same
   inputs, they may produce different outputs.

   <p:step name="p:log-system-temperature">
     <!-- writes system temperature to log file, returns system temp -->
     <p:output label="temp"/>
   </p:step>

One consequence of side-effects is that they may influence order of
execution, that is, A may need to be executed before B if B relies on
side-effects of A. I think the "auxiliary inputs" proposal allows
pipeline authors to address that issue. The other consequences,
updates to databases, additions to log files, etc., I'm willing to
call "out of scope". If we have consensus on that, then we just need
to deal with 1 and 2.

We could:

1. We could assert that all pipeline components must be 1 or 2.
2. We could indicate into which category every component in the
   standard library falls. Presumably engines that allow extension
   would have to let extension authors indicate this as well.
3. We could allow pipeline authors to identify whether or not
   an individual invocation of a component was or was not functional.

We've had two telcons of discussion about how to expose this
information and we don't seem to be approaching consensus, so let's
try a new approach.

I propose that we say that all components are non-functional. That is,
a pipeline implementation must behave as if it evaluated a component
every time it occurs. "Must behave as if" is spec-ease for "implementations
that are clever enough to determine with certainty that a component is,
in fact, functional are free to cache the intermediate results because
by golly if it is, no one will be able to tell."

Pipeline authors who are concerned about controlling the number of
times a component is executed can use the 'tee' component to split
output and thereby be confident that it will be evaluated only once.


                                        Be seeing you,
                                          norm

-- 
Norman Walsh
XML Standards Architect
Sun Microsystems, Inc.
Received on Wednesday, 26 April 2006 11:27:04 UTC