Re: final decision on well-formedness checking

Mark Baker writes:

>> There are certainly lots of good reasons to do it, but 
>> as written, I believe the spec requires well formedness 
>> checking, since it normatively refers to the XML Rec, 
>> and XML documents must be well formed.

Exactly.  Let's be a little more specific.  SOAP normatively depends on 
the Infset, which is really at a level above well formedness.  I think the 
real question is in the HTTP binding [1], which in turn describes the 
transmitted data as being in application/soap+xml, which in turn describes 
its content as  being "identical" to that of RFC 3023, application/xml 
[2].  That in turn effectively indicates that it is carrying what the XML 
recommendation calls "entities"., and in practice for our purpose means 
angle bracket notation <...> for the soap envelope.

As Mark says, XML has nothing to say about documents that are not well 
formed.  My view is that any XML software, SOAP or otherwise, that makes 
decisions based on the head of the input (I.e. before the whole document 
is checked) is doing so speculatively per the XML Recommendation.  Should 
the document prove to be other than well formed, we can conclude 
retroactively that whatever we thought we were doing, it wasn't XML.

There's one more interesting and rather subtly crafted piece of the SOAP 
Proposed Rec.  that might or might not enter into this discussion.  It 
says [3]:

"A message may contain or result in multiple errors during processing. 
Except where the order of detection is specifically indicated (as in 2.4 
Understanding SOAP Header Blocks), a SOAP node is at liberty to reflect 
any single fault from the set of possible faults prescribed for the errors 
encountered. The selection of a fault need not be predicated on the 
application of the "MUST", "SHOULD" or "MAY" keywords to the generation of 
the fault, with the exception that if one or more of the prescribed faults 
is qualified with the "MUST" keyword, then any one fault from the set of 
possible faults MUST be generated."

This is a bit wordy, but it's important and sometimes pertinent to the 
question Sanjiva asks.  It basically is saying:  it would be a big 
nuissance if everyone had to do their processing in the same order just to 
make sure that all processors reflect the same fault in the case where an 
input message has, say, 5 errors.  You can reflect any error you happen to 
notice first.  On the other hand, if any of the errors is mandatory, then 
you MUST reflect at least some error (in other words, if one of the errors 
was mandatory, then you can't declare success;  you can declare some other 
error.)

Putting that all together, my reading is that the proposed recommendation 
works as follows:

*  A SOAP node receiving a non-well formed entity through the HTTP binding 
MUST fault. 

* If it happens to notice some other error, a bad header perhaps, before 
it notices the well-formedness error, that's OK.  It can fault on that. (I 
understand this isn't the case Sanjiva cares most about, but it is 
important.) You don't have to run to the end of the document to prove 
well-formedness and THEN reflect the header error.

* If well-formedness is your only error, then my reading of the spec is 
that you must fault with an env:sender fault.  I see the recommendatino as 
silent on whether you might, if you were an intermediary, already have 
started to stream the message to a further hop.  Surely you should not 
send a message that looks good, because then you would have generated both 
a fault and a relayed message, which clearly violates the specification. 
What you might be able to get away with is forcing some binding-level 
error (such as dropping the TCP connection for the relayed message) before 
it completes. 

Bottom line, I think you are in principle responsible for checking well 
formedness, unless another error is encountered first.  I do think you can 
start to stream through an intermediary, but in the case that the message 
proves not to be well formed the spec says you should (a) reflect an 
env:sender error and, if you're receiving a request, respond with HTTP 
status code 400 and (b) ensure that you do not successfully complete 
relaying the message.

The only other latitude I can see is as follows:  I think you might be 
able to say "Look, I'm going to decide here that I wasn't a SOAP 
intermediary after all.  For messages of this sort, I'm going to claim to 
be some other sort of software that aids in the routing of messages.  I do 
some not-strictly-soap compliant checking of headers and pass the message 
on."  With that rationalization, I would think you can build software that 
does what you want.  Nothing in the SOAP recommendation can force your 
software to behave as a SOAP processor for every message you receive.  On 
the other hand, you must take responsibility for not conforming to the 
recommendation in the places where you don't.

Sorry for the long response, but this is about how I see it.



[1] http://www.w3.org/TR/2003/PR-soap12-part2-20030507/#soapinhttp
[2] http://www.ietf.org/rfc/rfc3023.txt
[3] http://www.w3.org/TR/2003/PR-soap12-part1-20030507/#procsoapmsgs
------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







"Sanjiva Weerawarana" <sanjiva@watson.ibm.com>
Sent by: xml-dist-app-request@w3.org
05/08/2003 01:16 AM

 
        To:     "Martin Gudgin" <mgudgin@microsoft.com>, <xml-dist-app@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Re: final decision on well-formedness checking



"Martin Gudgin" <mgudgin@microsoft.com> writes:
> Just to make sure I understand, you are advocating something like:
> 
> 1. Intermediary receives SOAP message
> 
> 2. Intermediary begins to parse stream 
> 
> 3. Intermediary gets to end of soap:Header. Everything is
> well-formed up to this point and intermediary has processed all headers
> targetted at it.
> 
> 4. Intermediary stops doing XML parsing and just streams the rest
> of the message ( the soap:Body and descendants, plus the closing
> </soap:Envelope> to the next node.
> 
> Is that roughly whay you're looking for?

Yep, that would do great.

If something were indeed wrong within the soap:Body say, something
else will break later down the route and I'd like that to be "ok".

Sanjiva.

Received on Thursday, 8 May 2003 16:59:13 UTC