RE: Issues with Packaging Application Payloads from Henrik Frystyk Nielsen on 2000-10-18 (xml-dist-app@w3.org from October 2000)

From: Henrik Frystyk Nielsen <frystyk@microsoft.com>
Date: Wed, 18 Oct 2000 14:57:10 -0700
To: "Joe Lapp" <jlapp@webmethods.com>, <xml-dist-app@w3.org>
Message-ID: <00de01c0394e$6942ed40$fb4c1fac@redmond.corp.microsoft.com>
Hi Joe,

Just so that you know, your original posting to the SOAP mailing list is
recorded on the SOAP issues list as item #21 at

    http://msdn.microsoft.com/xml/general/soapspec_issues.asp

pointing to


http://discuss.develop.com/archives/wa.exe?A2=ind0008&L=soap&F=&S=&P=41789

This also then led to quite a bit of discussion - the full thread is at

    http://discuss.develop.com/archives/wa.exe?A1=ind0008&L=soap#46

People might want to catch up on this thread as well.

Henrik

-----Original Message-----
From: Joe Lapp [mailto:jlapp@webmethods.com]
Sent: Tuesday, October 17, 2000 16:27
To: xml-dist-app@w3.org
Subject: Issues with Packaging Application Payloads


On August 18th I posted to the SOAP discussion list a set of issues that
I had with how SOAP packages application payloads.  Most of these issues
apply to the use of an XML envelope, so this group will face these
questions in the development of W3C XML Protocol.  Since the group is in
the requirements phase, I thought it best to make sure the issues are
known, so that they can help feed requirements.

By presenting this I don't mean to give any statement of webMethods'
position on either SOAP or XML Protocol.  webMethods has a habit of
supporting whatever protocols our clients need.  My job as engineer is
to make what it means to support a protocol as painless as possible to
both webMethods developers and those clients of ours who are inclined to
use the protocol.

You'll find that 8/18/2000 post duplicated below, in the <REPOST> tag.
Afterwards, I address some of the common objections.

<REPOST>

I'm partly encountering and partly anticipating a number of issues
related to how SOAP packages application payloads in XML documents.
These issues primarily apply to the use of SOAP with application-level
payloads and do not surface when SOAP is used strictly for RPC. I'm
providing the issues here to make them public and available for
discussion, but I think it would be most effective to resolve the issues
within a standards body that is generous enough to bring SOAP under its
wing.

You'll find the issues listed below. Please feel free to provide
corrections to any erroneous understanding I may have and suggestions
for how to deal with some of these situations. Here you go:

<ISSUES>

(1) Infrastructure header data and application payload data may be put
in the same XML document (the SOAP Envelope). When applications
unintentionally dump non-wellformed XML payloads into this document, the
entire document is non-wellformed. A robust server must protect itself
from client errors and cannot trust clients to deliver only wellformed
documents. Should an application or protocol choose to put payloads in
the XML envelope, it seems that this would create a number of problems
related to error handling:

(1a) Countless XML tools out there use parse-trees (instead of events)
and may not be capable of representing just the wellformed portion of a
document. Middleware based on such tools will not be able to act on the
SOAP headers that would otherwise apply, even for some content errors
that are not wellformedness errors. Likewise, recipient applications
will not be able to engage in the error management required by those
applications for application-level errors.

(1b) Even event-based XML tools suffer from not being able to reliably
deliver application payloads that are wellformed but which follow a
non-wellformed payload (within the body of the same XML envelope).
Errors that may be recoverable at the application level are not given
the opportunity to recover. Errors that may be ignored at the
application level will not be ignored. For example, should the
application payloads be semantically independent, as with an
application-level batching mechanism, SOAP would create dependencies
between them.

Issues (1a) and (1b) are a direct result of the XML 1.0 Specification's
requirement that wellformedness errors be fatal errors within an XML
processor. In particular, the specification says that "Once a fatal
error is detected, however, the processor must not continue normal
processing (i.e., it must not continue to pass character data and
information about the document's logical structure to the application in
the normal way)." To relate this to SOAP, whenever an application
payload is placed into the envelope body, an assertion is made that the
payload constitutes wellformed XML, and the SOAP envelope ends up
creating dependencies among data that did not exist prior to enveloping
the data. This violates the clear separation of layers that most
protocols have.

(2) Performance is another issue that seems to surface when
infrastructure and application data are put into the same XML document.
Consider SOAP's requirement that when a header targeted for a given
actor reaches that actor, the header must be consumed (usually meaning
removed) from the document before the document may proceed. If the SOAP
header is 4K in size and the SOAP body has a 1MB payload, you wouldn't
want to parse the document, remove one element from the beginning of the
document, and then regenerate the 1+MB document to forward to the next
destination. Yet most tools will require this approach -- even most
event-based tools. It is possible to create a more specialized parser
that will provide a portion of the document as XML and still allow
concatenating the remainder as text without going through the parsing
process, but that's kind of a tall order for the everyday parser we hope
will work with SOAP.

(3) SOAP does not sufficiently address the issues associated with mixing
MIME-based packaging and XML-based packaging. I have identified four
distinct locations in a SOAP message where application XML-based data
could be placed: in the envelope body, in the envelope header, as an
immediate child of the envelope root, and as a MIME part following the
envelope. Section 4.3.1 of SOAP 1.1 asserts a "semantic" equivalence
between payloads and headers under certain circumstances, which is what
allows an application payload to appear in the header. Since these two
payload areas are semantically equivalent at the SOAP level, SOAP need
not provide a distinction between these payloads at the application
level. However, the question remains as to whether SOAP provides
applications with the ability to distinguish among the remaining payload
locations.

Should SOAP allow the application to know or select the envelope in
which each payload is packaged? Is it reasonable to expose this
packaging detail at the application level? Can an application
distinguish between MIME headers and XML attributes (eg. encodingStyle)
when examining or specifying per-payload properties, or would
applications also require an awareness of these distinctions?

The specification "SOAP Messages with Attachments," by John J. Barton
(Hewlett Packard Labs) and Satish Thatte (Microsoft), does an excellent
job of specifying how one puts MIME attachments in a SOAP message along
with some of the interactions between payloads, but it does not address
these sorts of envelope-transparency issues. (This specificiation was
posted to http://discuss.develop.com/soap.html on July 7, 2000, but the
attachment does not seem to be available from the archive.)

(4) Many RFCs and standards have been created to specify how one uses
MIME headers to package MIME parts for a particular purpose. Some
specify document types, some specify character encodings, some specify
that the data is encrypted, some specify the presence of signatures or
certificates needed to interpret the message -- in general any
information needed for middleware software to communicate
payload-specific handling. If we put data in the body portion of the
envelope, we forsake all the benefits available through tools that
implement these standards. Is it reasonable to put payloads in one place
only when those standards aren't needed and in the other when they are
needed? SOAP adds the encodingStyle attribute to each payload. Won't we
also sometimes need that with a MIME part?

</ISSUES>

Okay, assuming that my issues are substantially correct (which may be a
completely false assumption), what benefits are there to allowing an XML
packaging mechanism at all in SOAP for application payloads? Why not
just define SOAP in a packaging-independent way and provide a binding
for MIME (or its HTTP variant) for now, until the W3C produces a
well-thought-out specification for XML packaging? The only answer I can
think of is a good one, but not one that stands up to the requirements
of high-end B2B ecommerce: that life is easier for developers if they
don't have to learn and work with yet one more technology -- MIME.

Joe Lapp
Principal Architect
webMethods, Inc.

P.S. This email was substantially critical of SOAP, or at least of
SOAP's packaging of application payloads in XML, so I want to end by
saying that SOAP 1.1 is the most impressive XML specification I've had
the pleasure to read. I have referred many people to it to use as a
model for their own XML specifications (even including some
Microsofties). It is very easy to read, very easy to understand, and
very succinct. I'm impressed with the extensibility mechanism created
for the SOAP header -- I'm especially impressed that the designers
understood that the X in XML does not by itself give them extensibility.
I'm most impressed with how little documentation is required to specify
a working messaging protocol. ebXML and RosettaNet RNIF have much to
learn from the SOAP and BizTalk specifications.

</REPOST>

Okay, let's look at some of the objections I have received:

(A) I say that these issues apply more to non-RPC uses of the SOAP
protocol than to RPC uses, and one objection I've heard is that the SOAP
envelope doesn't distinguish between the two, so that if the issues
don't exist for one mode they shouldn't exist for the other.  I have a
few responses to this "objection":

First, the issues stand by themselves, so evaluate for yourself whether
or not they apply to RPC.  The next two responses just explain why I
bothered to make this assertion.

Second, the SOAP spec defines the RPC behavior but not the other
application-specific behavior.  To be SOAP-compliant requires
conformance with the SOAP spec, so by definition, RPC should behave well
between two SOAP-compliant nodes.  If the nodes aren't SOAP compliant,
then you wouldn't necessarily expect communication anyway.

Finally, the RPC semantics and marshalling can be made independent of
the application either by using an introspective language like Java or
by using a single tool for generating the stubs of all (or most
applications).  There will be far more applications than infrastructure
software and stub-generators, so it will be easier to ensure that the
infrastructure and stub-generators behave well, but near impossible to
ensure that all applications behave well.

(B) Another objection: The software infrastructure will be responsible
for transmitting wellformed XML.  Most of the XML will therefore be
wellformed, and these issues will not arise frequently enough to bother
addressing them.

Response: Note that this is only an objection to point (1) and makes no
statement regarding points (2) through (4).

I believe that this objection makes a false assumption.  First, I doubt
that everybody will be using the same robust implementations of SOAP (or
XML Protocol).  There will always be the hacker types and those who
think they have value-add to give.  But even if we grant this scenario
-- that everybody uses proven implementations -- we still have another
problem.  Should an application ever create the XML that needs to be
delivered, it the protocol stack would have to reparse that XML on the
client-side before sending it, just to provide the robustness guarantee.
That's another performance issue.  I seriously doubt that all protocol
clients are going to enforce the reliability of the XML.

(C) Another objection: We will have a better world if the server
infrastructure enforces wellformedness on behalf of the applications.
This reduces the work of the applications, and it pushes error detection
and error handling closer to its source.

Response: I agree in principle with this objection, but I disagree that
XML allows us to do this properly, and I disagree that all protocol
stacks should be required to provide this enforcement.  XML does not
allow us to do this properly because the first wellformedness error is
officially required to be a fatal error, by the XML 1.0 spec itself.
There is no standard handling for fatal errors -- no standard way to
identify the kind of error or even a requirement that the kind of error
be identifiable.  When such an error occurs, many XML parsers will force
the recipient to just reject the message.  The recipient may not be able
to log the message headers, even if they are wellformed (as I assume) or
inform the target application, should the target application need to
engage in error management.  If the message contains is a batch of
independent commands (I don't mean independent messages), the
application won't be able to handle the wellformed ones independently.

(D) Another objection: An XML envelope gives us the advantage of
extensibility that we wouldn't otherwise have.

Response: I see this "extensibility" as extensibility of the headers,
not of the payloads.  MIME is pretty flexible with its payloads,
allowing heirarchy and arbitrary content.  The issue is attributing the
message as a whole and attributing the individual payloads.  One could
address the extensibility of the message headers by having an XML
document represent the headers.  This is done in ebXML and RNIF
(RosettaNet).  One could address the extensibility of the payload
headers through an XML manifest or an optional XML attachment that may
prefix any given payload.  I'm sure other possibilities exist as well.

(E) Another objection: The W3C decided not to do XML packaging, but XML
packaging confers enough benefits that it ought to be addressed to some
extent, even if only minimally.  XML Protocol is the right place for
that.  Benefits include the ability to apply XML tools to the packaged
message; MIME tools do not exist in such diversity.

Response: The tools argument works for me to some degree, but in my mind
the negatives outweight the positives.  XML packaging (of the sort being
discussed for a protocol) needs a lot of thought and can't be considered
without first understanding the negatives.

Also, I don't find it relevant to our discussion that the W3C decided to
abandon XML packaging.  I don't really know what that working group
meant by "XML packaging."  The issue is that SOAP (and perhaps XML
Protocol) is defining an XML envelope for packaging application
payloads, even if only the XML payloads.  The term 'packaging' may be
overloaded, so let's focus on the semantics rather than the words.

And since, as so far speced, the XML envelope can only handle the XML
payloads, it seems that we aren't simply making message packages more
amenable for use by XML tools.  We are now requiring that both MIME
tools and XML tools be present.  This is more onerous than requiring the
presence and use of just one tool set (or API set).

======

Okay, enough of that.  If you got this far, thank you very much for
giving me your time.  I'll live with whatever the working group comes up
with; I just wanted to make sure that the issues are heard and known and
that the decisions made are fully educated ones.

Joe Lapp
Principal Architect
webMethods, Inc.

P.S. Randy Waldrop is our official working group member and will be
formally representing webMethods' interests.  I don't plan on
participating in this discussion (too much else to do), and Randy is
free to defend or attack or ignore these issues as he pleases.  This
group has a huge amount of expertise, and I trust that you will make
appropriate use of these points, whatever use that may be.
Received on Wednesday, 18 October 2000 17:57:33 UTC