RE: Issue #170: "Referencing Data missing from the message" from Williams, Stuart on 2002-01-07 (xml-dist-app@w3.org from January 2002)

From: Williams, Stuart <skw@hplb.hpl.hp.com>
Date: Mon, 7 Jan 2002 10:56:02 -0000
To: "'Jacek Kopecky'" <jacek@systinet.com>, "Williams, Stuart" <skw@hplb.hpl.hp.com>
Cc: Henrik Frystyk Nielsen <henrikn@microsoft.com>, noah_mendelsohn@us.ibm.com, xml-dist-app@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F19286E@0-mail-1.hpl.hp.com>
Hi Jacek,

>  Stuart,
>  you and others talk about the sender relying (or not) on getting
> a fault in case a reference in the graph is unreachable. I don't
> think that the sender cares at all because if it did it would
> assume it _can_produce_ an unreachable reference. Why should it
> do so?

Well... you seem very concerned that a fault always be 'generated'.
Presumably that strong requirement to generate a fault arises from a need
for that fault to have some effect. 

If the sender is 'uninterested' in the fault then the potential interested
parties reduce to a bunch of intermediaries (that may be invisible to the
generator of the fault) and the node that generated the fault.

I guess its plausible that an intermediary that has behaved badly and has
removed an essential piece of a message might learn something useful from a
fault and mend its ways - but I suspect for some time to come that may
require operator and maybe developer intervention :-)

If the sole party interested in the generated fault is the node where the
fault was generate then that's entirely a local matter and not one that
requires the mandatory generation of a fault.

Basically, I think that if you are going mandate the generation of a fault
in ALL cases where the referenced data is missing I would expect you to be
strongly interested in what becomes of that fault. For example, in an RPC
situation, it feels something like a dangling-pointer in traversing a
data-structure that you might want to model as an exception in the client
programming language environment (triggered by receipt of the fault).

However, you say... "I don't think that the sender cares at all..." which
leaves me wondering what entity you think actually does care about the fate
of any fault generated due to references to missing data and why you feel so
strongly that this is a MUST fault rather than a MAY fault.

>  In case we allow external references, the external reference may
> be unreachable at the time when (or from the node where) it's
> being dereferenced. But you yourself demonstrate that you can
> easily imagine us using IDREFs for references.

I also acknowledged that there are others that feel strongly in favour of
URIRefs and that I would like to better understand why they feel so strongly
on the matter.

>  In case we use IDREFs, missing data either mean a problem in the
> sender (if it generates a broken IDREF) or it can be that an
> intermediary stripped out a part of the message.
>  In the latter case, as I wrote before, I think that the
> intermediary better understand the data and fill NULLs (instead
> of references) at the appropriate places so that no link remains
> broken.

If an intermediary is broken in this way... what behaviour would you expect
of it upon receipt of the proposed fault? Take itself out of service and
phone for an upgrade? ;-) Crumbs... I find myself wondering what a
standardised MIB for a SOAP node might look like and what SNMP Traps it
might generate.

>  In any case, I think a broken IDREF is a critical error
> condition, something that never occurs in correct applications.
> Therefore I suggest a MUST, just like MustUnderstand and
> DTDNotSupported and VersionMismatch faults.

Well... maybe there are some issues with MUST fault on some of these aswell!

"MUST fault" to me implies that we have a pretty clear idea of what effect
the fault is intented to have and at some system level we will be relying on
fault generation in such circumstances otherwise why MUST? MUST gives a
guarantee that someone/thing will come to rely on, if a MUST guarantees
nothing, it has no business being a MUST.

>  As for lazy parsing, it is not common (in an explicit form) in
> any RPC-like system that I know of. In a hidden form it is an
> optimization which should not change the semantics. 

Agreed.

> This case
> suggests that if we mandate a fault in case of a broken reference
> we remove the possibility of such optimization.

>  On the other hand, is a fault mandated for a case where there
> are two env:Body elements in a message? If so, we disallow
> streaming as well because before we can start processing the
> Body, we must be sure that there is no other Body. Same case for
> non-well-formed SOAP messages, too.
>  So if we expect a fault for two-Bodied messages, we should as
> well expect a fault for broken references, IMHO. There is no
> wording about the former expectation in the spec, though, but I
> think that faulting (or other kind of failing) is implied when a
> SOAP node receives something that is not a SOAP message (like the
> two Bodies case or a non-well-formed document case).

Not sure a fault arises in all these cases. Some binding specific means to
report the failure (eg HTTP status codes - 400 Bad Request) may also be
used.

>  If this does not prevent streaming, I think a MUST for faulting
> in case of a broken reference does not prevent lazy parsing.

It doesn't seem so black and white to me. With respect to streaming we have
the careful wording of the SOAP processing model Part 1 section 2.6 [1]
"...SOAP faults and application-level side effects are equivalent to those
that would be obtained by direct implementation of the following rules in
the order shown below.". On the surface, both streaming and lazy parsing
allow you to 'skip' bits of the message that you (think) you're not
interested in. However, our MUST/MAY choices do affect what we *have* to be
interested in in order to process the message.

I also think, that despite the increasing length of this thread, the thing
that closes this issue is the choice of MUST or MAY with respect to the
generation  of a fault when referenced data is missing from a message.

My preference remains MAY... and I am prepared to believe that we may have
been over zealous with MUSTs elsewhere in the (draft) spec.

>  Best regards,
> 
>                    Jacek Kopecky
> 
>                    Senior Architect, Systinet (formerly Idoox)
>                    http://www.systinet.com/
> 

Best regards,

Stuart
[1] http://www.w3.org/TR/soap12-part1/#procsoapmsgs
Received on Monday, 7 January 2002 05:56:23 UTC