RE: An analysis of mustUnderstand and related issues from Noah_Mendelsohn@lotus.com on 2001-05-30 (xml-dist-app@w3.org from May 2001)

From: <Noah_Mendelsohn@lotus.com>
Date: Wed, 30 May 2001 17:17:53 -0400
To: xml-dist-app@w3.org
Message-ID: <OFB024A450.37837BE5-ON85256A5C.0074D6AD@lotus.com>
RESEND:  Correcting typo in "Step 3" below.  (I tried resending last week,
but for some reason it didn't seem to get into the W3C archive.  Sorry for
bothering everyone again.)

(long note warning --- discussion now on
 dist-app only. NRM)

On the teleconference yesterday, I took an action to comment on the
proposal that Henrik made in his note [1], and to analyze the relationship
of Henrik's proposal to the problem that Glen has raised (I think it was
Glen) relating to mustUnderstand.  In brief, the "Glen paradox" is that
(a) mustUnderstand is intended to allow safe extension of SOAP, but (b) it
is not clear whether such mustUnderstand extensions will be noticed and
rejected, if misunderstood, before other dependent processing at the same
actor is attempted.

Henrik's analysis and what the SOAP spec says
---------------------------------------------

Henrik suggests that SOAP V1.1 mandates, at a given actor: "Either the
whole message succeeds or the whole message fails."   I do not think the
SOAP specification makes this guarantee, but perhaps it comes close to
ensuring enough to solve Glen's problem.  The pertinent text is, as far as
I can tell [3]:

***
"A SOAP application receiving a SOAP message MUST process that message by
performing the following actions in the order listed below

1. Identify all parts of the SOAP message intended for that application
(see section 4.2.2)
2. Verify that all mandatory parts identified in step 1 are supported by
the application for this message (see section 4.2.3) and process them
accordingly. If this is not the case then discard the message (see section
4.4). The processor MAY ignore optional parts identified in step 1 without
affecting the outcome of the processing.
3. If the SOAP application is not the ultimate destination of the message
then remove all parts identified in step 1 before forwarding the message.

Processing a message or a part of a message requires that the SOAP
processor understands, among other things, the exchange pattern being used
(one way, request/response, multicast, etc.), the role of the recipient in
that pattern, the employment (if any) of RPC mechanisms such as the one
documented in section 7, the representation or encoding of data, as well
as other semantics necessary for correct processing."
***

I read this as somewhat ambiguous.  Step 1 seems to be vacuously
satisfied, except insofar as it prepares one for step 2.  It doesn't
really impose an order on anything.  Step 2 can be read as either: "Verify
that everything mandatory is supported and then start processing" or
"Verify/process, verify/process."  I infer that Henrik reads it as the
former.  I can say that interpretation never occurred to me, but it is
plausible.  At the very least, this should go on the clarification list.
See below for my analysis of the pros and cons and implications for the
Glen question.

One other complication which I think needs clarification in the SOAP
specification is that, as Henrik and I have both noted, the same
processing software can have more than one actor name (e.g. "http://...." &
"next").  Any rules that refer to the responsibility of an actor must
clearly indicate whether each distinct actor name is handled separately,
or whether the actor is free to take on as many identities as it considers
appropriate, and to apply rules to all matched header entries in bulk.  In
the particular case above, if some header entries are explicitly addressed
to an intermediary, and others are addressed to 'next', must all be
checked for mustUnderstand before any processing is attempted?

In any case, I don't think the SOAP specification comes close to
guaranteeing that "Either the whole message succeeds or the whole message
fails."  Even if I "understand" all the MU's there can be all sorts of
failures as I work my way through them.  I certainly don't see anything
above as requiring rollback of such partial processing, and I do not
believe that it would in general be practical in lightweight SOAP
implementations.

A proposal for next steps
-------------------------

The first question, of course, is whether I have correctly understood
Henrik's position and proposal, and have correctly interpreted the SOAP
specification.  If not, we have to store that out.

If so I would reword Henrik's summary as suggesting that "all
mustUnderstand failures are noticed before any processing is attempted". I
think that is what he intends, and I believe it would solve Glen's
problem.  I am concerned, however, that such a rule makes streaming
extremely difficult.  I think this says that you cannot safely do any
processing until you encounter at least the start of the Body.  I suppose
if we expect headers, collectively, to be short, that is not a big
problem.  It certainly changes my reading of the specification.

So, step 2 is to decide whether the requirement to inspect all headers
before doing any processing is indeed practical.  I think this is a
question that should be discussed with soapbuilders.

Step 3:  If we agree on this clarification (I'm not in favor yet, but it
is the next thing to explore), then we should try to convince ourselves
that it really does solve the "Glen Paradox".

Step 4 (or 2') is to straighten out the multiple names for an actor
question.  I strongly believe that any piece of software should be free to
act as any actor.  SOAP addressing should in no way attempt to guard
against malicious implementation at this level, I believe.  If you send me
a message with various headers and actors, I should be free to decide that
"I am a cache manager",  "I am a transaction manager", etc.  If you don't
trust me, then we need robust authentication technology, not a limited
rule about me not claim to have two addresses.  That settles half the
question.  The second part is: with respect to each part of the SOAP spec
where processing resonsibilies are assigned to intermediaries or the end
point, clarify the rules in the case where multiple identities are
involved.

Step 5:  Consider the overall requirements for atomicity and rollback.
Maybe I should be required to check mustUnderstand on everything before
starting; I am very nervous about having to rollback processing in the
case that a variety of other errors occur while I am working on message.
Core SOAP should not require transactions IMHO.

I think that's a reasonable roadmap.  Henrik (and everyone else): do you
agree?

Some detailed comments on Henrik's Note
---------------------------------------

I think of covered most of the highlights above.  Here are some detailed
comments on Henrik's note.

>> Please that the formulation with a grain of salt...
>>
>> The current model is that from a SOAP/1.1 processing point
>> of view, the order of header entries doesn't matter, neither
>> between actors or within actors. Either the whole message
>> succeeds or the whole message fails. As long as we have a
>> SOAP envelope as the "unit of communication" I think we have
>> to enforce this holistic processing view. In this sense, any
>> ordering mechanism that we might deploy can be seen as a
>> performance optimization albeit an important optimization.
>>
As discussed above, I think this goes beyond what the spec
says.  I can see a case that it might try to say:
"check all MU's before doing anything".

>> As the processing model doesn't allow for partial success,
>> applications may always have to perform some form for
>> compensation if the processing suddenly goes wrong in the
>> middle.
>>
Whoa... that seems really complicated, and as noted above
I don't see where the spec requires this. Nothing in
SOAP says anything regarding atomicity.  Indeed, we
seem to have been quite clear in other discussions
that anything resembling transactions is separate
from core SOAP.

>> Note that processing faults can be due to unhandled
>> mandatory blocks or because the 34th parameter 7 levels down
>> in a specific block was out of bounds.
>>
Yes.

>> Also note that even though the processing can start, it is
>> not clear that any follow-up message can be generated
>> because we can't change a success to a failure in the middle
>> in a clean way. The only mechanism is to effectively break
>> the transmission so that the whole message becomes invalid
>> but then the recipient doesn't know whether it was a
>> processing failure or a communications failure.
>>
I think you are implying a model in which response
messages are generated in a streaming mode,
and must somehow be labeled as broken when late
processing fails?  Ironically, here is where
I think SOAP does imply atomicity.  In cases where
a response is required, it discusses only fully formed
responses or faults that reflect the net status
of processing of the request.

>> Btw, an additional consequence of this model is that
>> boxcarring implicitly is deprecated because it severely
>> complicated the protocol and introduces unknown fault cases.
>>
Although I have quibbled with your premises, I agree
that boxcarring raises a variety of messy issues.
Like you, my intuition is to avoid it in the core
protocol.

>> The problem of enforcing ordering in SOAP itself is exactly
>> the problem that Jean-Jacques brings up: where should
>> intermediaries insert blocks?  In general it is desirable
>> that an intermediary does not have to know anything about
>> the blocks provided by other parties in the message path
>> before it can insert its own blocks into the message. That
>> would constrain the extensibility model in SOAP.
>>
>> A similar discussion on soapbuilders discuss the use of
>> references in the SOAP section 5 encoding and what happens
>> if one finds links that refer to previous entries.
>>
>> It is also related to the discussion of trailers - is it
>> possible to stick trailers in after the body so that one can
>> compute a signature over the body and stick the result in at
>> the end without having to buffer the contents?
>>
>> Noah also brings up the very good point that actors may be
>> addressed in a variety of ways using "next" and the specific
>> URI as the obvious example.
>>
>> Because of intermediaries, I question that overall ordering
>> of a message is useful. In order to support (potentially
>> multiple) partial ordering(s) in a message, however, I think
>> we need to provide a few simple rules for how such orderings
>> can be expressed:
>>
>> 0) Blocks are fundamentally unordered from a processing
>> point of view.
>>
I think I agree that this is the right building block.
Ideally, we will convince ourselves that any features relating
to ordering can be added as modules using mustUnderstand.
Pending demonstration of that, we must be prepared
for a fallback position that some ordering features
have to be built into the core protocol after all.
I agree that this is second choice.

>> 1) A SOAP intermediary MUST NOT reorder header entries. It
>> may add header entries anywhere in the message
>>
Hmmm.  I think it's a bit more subtle than that.
Certainly some intermediaries will at least remove
headers not even destined for that actor.  Example:
I presume an encrypting intermediary removes the
plain text headers and inserts something encrypted
instead.  Possibly, one encrypted header replaces
many others, and for a variety of downstream actors.

I think an analogous argument suggest that there
may be cases where a knowledgeable intermediary
should indeed reorder (perhaps for optimization
purposes?)  I can't prove that.

Anyway, my intuition would be to say something
more like:  "In general, SOAP intermediaries
MUST  preserve the order of header entries
as a message is related to a downstream
processor.  The exception to this rule is
when the semantics of header entry(s)
addressed to the intermediary
specifically sanction deletion,
modification, or reordering of
entries destined to other actors."

What do you think of this?

>> 2) Dependencies are indicated in one of two ways:
>>
>>   * Simple XML encapsulation in which the
>>     SOAP header is somewhat
>>     out of the picture as the encapsulation
>>     happens with a header entry
>>
>>   * By referring to other header entries using
>>     links. In this case we can say that such
>>     links SHOULD point *forward* in the message
>>
I am very nervous about these as the general
mechanisms.  If I understand what you mean by
encapsulation, any logical header entry
that must be ordered has to be "burried"
at a lower level of XML nesting,
where general-purpose SOAP tools will not recognize
it as a SOAP 1.1 header entries.  That doesn't
feel right.

The second proposal needs to be fleshed out,
but it seems to basically suggest that
lexical order is significant after all?
How would the link you suggest be
different than my "dependsOn" proposal?
I think that was indeed using links,
though without the lexical order.
Either way, we need to figure out
how mustUnderstand would cover
this semantic on link.

>> 3) It is allowed to introduce trailers after the body but
>> only if a header points to them indicating that the trailer
>> will follow and what it will contain (we don't define
>> this). The SOAP/1.1 schema actually allows this but the
>> description of how to deal with trailers is very vague.
>>
>> Hope this makes at least a bit of sense...
>>

I think so, thank you.  You'll have to tell me
whether in fact I made sense of it. :-)

>> Henrik Frystyk Nielsen mailto:henrikn@microsoft.com
>>
>>

[1] http://lists.w3.org/Archives/Public/xml-dist-app/2001May/0284.html
[2] http://lists.w3.org/Archives/Public/xml-dist-app/2001May/0310.html
[3] http://www.w3.org/TR/SOAP/#_Toc478383491

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------
Received on Wednesday, 30 May 2001 17:22:06 UTC