RE: Issues with Packaging Application Payloads

Thanks for the thoughtful mail.  I have some ideas on some of the points you
raise, though I don't suggest that these are exhaustive.
 
1.    You raise the point that an application might assemble what it hopes
is a valid XML/SOAP message but inadventantly include some character data
that renders the document invalid.  This is true, but I don't think the
problem is limited to SOAP or even to XML.  If one has any sort of an
envelope or syntactic structuring, whether XML, MIME etc., then the
divisions between parts will need to be marked somehow.  If the marking is
by way of separator or terminator sequences (as opposed to length prefixes,
which have other problems) then it is always possible that an application
might include data that conflicts with the structural markers.
    From the point of view of a protocol spec, I think the best one can say
is that the application should not do this.  Similarly, the HTTP
specification does not discuss how to deal with messages that are not HTTP,
only those that are HTTP.
    Once could certainly design APIs that make it easier for applications to
ensure that they have not accidently created invalid messages.  This is not
per se part of the protocol.  Similarly, servers will always want to ensure
that they do not fail even if handed invalid would-be XML, or stuff that is
not even remotely XML.  I believe that SOAP 1.1 provides enough description
of Fault reporting to report errors.
 
2.    It has indeed been one of the design decisions of XML to use
"Draconian" error handling, to stop processing if the parsed document is not
XML.  This was a conscious decision that rated interoperability and validity
as more important than authoring convenience.  SOAP continues that approach,
believing that it has merits and is consistent with the goals of XML
generally.  Naturally, that does mean that some documents that are "almost
valid" will be rejected.  
 
3.    If I understand you, the perfomance problem you cite is more an issue
of the suitability of a DOM-based message handler for large messages.
Whether SOAP, or some other use of XML, or even if using MIME or something
else, if an application processes large messages by first parsing them into
a tree or other buffer and then examining the contents, that will be more
expensive than some more streamlined techniques.
    
I think that one of the themes running through the above comments is that
I'm trying to separate the issues that relate to the protocol from other
issues that relate to good application or API design.  The protocol spec is
more like a spec for XMLthan a spec for a browser.  There are better and
poorer browsers, and many differences of taste, and I expect that there will
be better and poorer application support libraries.
 
By the way, thank you for the kind compliments on SOAP. 
 
--Andrew Layman

-----Original Message-----
From: Joe Lapp [mailto:jlapp@webmethods.com]
Sent: Tuesday, October 17, 2000 4:27 PM
To: xml-dist-app@w3.org
Subject: Issues with Packaging Application Payloads


On August 18th I posted to the SOAP discussion list a set of issues that I
had with how SOAP packages application payloads.  Most of these issues apply
to the use of an XML envelope, so this group will face these questions in
the development of W3C XML Protocol.  Since the group is in the requirements
phase, I thought it best to make sure the issues are known, so that they can
help feed requirements.

By presenting this I don't mean to give any statement of webMethods'
position on either SOAP or XML Protocol.  webMethods has a habit of
supporting whatever protocols our clients need.  My job as engineer is to
make what it means to support a protocol as painless as possible to both
webMethods developers and those clients of ours who are inclined to use the
protocol.

You'll find that 8/18/2000 post duplicated below, in the <REPOST> tag.
Afterwards, I address some of the common objections.

<REPOST>

I'm partly encountering and partly anticipating a number of issues related
to how SOAP packages application payloads in XML documents. These issues
primarily apply to the use of SOAP with application-level payloads and do
not surface when SOAP is used strictly for RPC. I'm providing the issues
here to make them public and available for discussion, but I think it would
be most effective to resolve the issues within a standards body that is
generous enough to bring SOAP under its wing.

You'll find the issues listed below. Please feel free to provide corrections
to any erroneous understanding I may have and suggestions for how to deal
with some of these situations. Here you go:

<ISSUES>

(1) Infrastructure header data and application payload data may be put in
the same XML document (the SOAP Envelope). When applications unintentionally
dump non-wellformed XML payloads into this document, the entire document is
non-wellformed. A robust server must protect itself from client errors and
cannot trust clients to deliver only wellformed documents. Should an
application or protocol choose to put payloads in the XML envelope, it seems
that this would create a number of problems related to error handling:

(1a) Countless XML tools out there use parse-trees (instead of events) and
may not be capable of representing just the wellformed portion of a
document. Middleware based on such tools will not be able to act on the SOAP
headers that would otherwise apply, even for some content errors that are
not wellformedness errors. Likewise, recipient applications will not be able
to engage in the error management required by those applications for
application-level errors.

(1b) Even event-based XML tools suffer from not being able to reliably
deliver application payloads that are wellformed but which follow a
non-wellformed payload (within the body of the same XML envelope). Errors
that may be recoverable at the application level are not given the
opportunity to recover. Errors that may be ignored at the application level
will not be ignored. For example, should the application payloads be
semantically independent, as with an application-level batching mechanism,
SOAP would create dependencies between them. 

Issues (1a) and (1b) are a direct result of the XML 1.0 Specification's
requirement that wellformedness errors be fatal errors within an XML
processor. In particular, the specification says that "Once a fatal error is
detected, however, the processor must not continue normal processing (i.e.,
it must not continue to pass character data and information about the
document's logical structure to the application in the normal way)." To
relate this to SOAP, whenever an application payload is placed into the
envelope body, an assertion is made that the payload constitutes wellformed
XML, and the SOAP envelope ends up creating dependencies among data that did
not exist prior to enveloping the data. This violates the clear separation
of layers that most protocols have.

(2) Performance is another issue that seems to surface when infrastructure
and application data are put into the same XML document. Consider SOAP's
requirement that when a header targeted for a given actor reaches that
actor, the header must be consumed (usually meaning removed) from the
document before the document may proceed. If the SOAP header is 4K in size
and the SOAP body has a 1MB payload, you wouldn't want to parse the
document, remove one element from the beginning of the document, and then
regenerate the 1+MB document to forward to the next destination. Yet most
tools will require this approach -- even most event-based tools. It is
possible to create a more specialized parser that will provide a portion of
the document as XML and still allow concatenating the remainder as text
without going through the parsing process, but that's kind of a tall order
for the everyday parser we hope will work with SOAP.

(3) SOAP does not sufficiently address the issues associated with mixing
MIME-based packaging and XML-based packaging. I have identified four
distinct locations in a SOAP message where application XML-based data could
be placed: in the envelope body, in the envelope header, as an immediate
child of the envelope root, and as a MIME part following the envelope.
Section 4.3.1 of SOAP 1.1 asserts a "semantic" equivalence between payloads
and headers under certain circumstances, which is what allows an application
payload to appear in the header. Since these two payload areas are
semantically equivalent at the SOAP level, SOAP need not provide a
distinction between these payloads at the application level. However, the
question remains as to whether SOAP provides applications with the ability
to distinguish among the remaining payload locations. 

Should SOAP allow the application to know or select the envelope in which
each payload is packaged? Is it reasonable to expose this packaging detail
at the application level? Can an application distinguish between MIME
headers and XML attributes (eg. encodingStyle) when examining or specifying
per-payload properties, or would applications also require an awareness of
these distinctions?

The specification "SOAP Messages with Attachments," by John J. Barton
(Hewlett Packard Labs) and Satish Thatte (Microsoft), does an excellent job
of specifying how one puts MIME attachments in a SOAP message along with
some of the interactions between payloads, but it does not address these
sorts of envelope-transparency issues. (This specificiation was posted to
http://discuss.develop.com/soap.html on July 7, 2000, but the attachment
does not seem to be available from the archive.)

(4) Many RFCs and standards have been created to specify how one uses MIME
headers to package MIME parts for a particular purpose. Some specify
document types, some specify character encodings, some specify that the data
is encrypted, some specify the presence of signatures or certificates needed
to interpret the message -- in general any information needed for middleware
software to communicate payload-specific handling. If we put data in the
body portion of the envelope, we forsake all the benefits available through
tools that implement these standards. Is it reasonable to put payloads in
one place only when those standards aren't needed and in the other when they
are needed? SOAP adds the encodingStyle attribute to each payload. Won't we
also sometimes need that with a MIME part?

</ISSUES>

Okay, assuming that my issues are substantially correct (which may be a
completely false assumption), what benefits are there to allowing an XML
packaging mechanism at all in SOAP for application payloads? Why not just
define SOAP in a packaging-independent way and provide a binding for MIME
(or its HTTP variant) for now, until the W3C produces a well-thought-out
specification for XML packaging? The only answer I can think of is a good
one, but not one that stands up to the requirements of high-end B2B
ecommerce: that life is easier for developers if they don't have to learn
and work with yet one more technology -- MIME.

Joe Lapp 
Principal Architect 
webMethods, Inc.

P.S. This email was substantially critical of SOAP, or at least of SOAP's
packaging of application payloads in XML, so I want to end by saying that
SOAP 1.1 is the most impressive XML specification I've had the pleasure to
read. I have referred many people to it to use as a model for their own XML
specifications (even including some Microsofties). It is very easy to read,
very easy to understand, and very succinct. I'm impressed with the
extensibility mechanism created for the SOAP header -- I'm especially
impressed that the designers understood that the X in XML does not by itself
give them extensibility. I'm most impressed with how little documentation is
required to specify a working messaging protocol. ebXML and RosettaNet RNIF
have much to learn from the SOAP and BizTalk specifications.

</REPOST>

Okay, let's look at some of the objections I have received:

(A) I say that these issues apply more to non-RPC uses of the SOAP protocol
than to RPC uses, and one objection I've heard is that the SOAP envelope
doesn't distinguish between the two, so that if the issues don't exist for
one mode they shouldn't exist for the other.  I have a few responses to this
"objection":

First, the issues stand by themselves, so evaluate for yourself whether or
not they apply to RPC.  The next two responses just explain why I bothered
to make this assertion.

Second, the SOAP spec defines the RPC behavior but not the other
application-specific behavior.  To be SOAP-compliant requires conformance
with the SOAP spec, so by definition, RPC should behave well between two
SOAP-compliant nodes.  If the nodes aren't SOAP compliant, then you wouldn't
necessarily expect communication anyway.

Finally, the RPC semantics and marshalling can be made independent of the
application either by using an introspective language like Java or by using
a single tool for generating the stubs of all (or most applications).  There
will be far more applications than infrastructure software and
stub-generators, so it will be easier to ensure that the infrastructure and
stub-generators behave well, but near impossible to ensure that all
applications behave well.

(B) Another objection: The software infrastructure will be responsible for
transmitting wellformed XML.  Most of the XML will therefore be wellformed,
and these issues will not arise frequently enough to bother addressing them.

Response: Note that this is only an objection to point (1) and makes no
statement regarding points (2) through (4).

I believe that this objection makes a false assumption.  First, I doubt that
everybody will be using the same robust implementations of SOAP (or XML
Protocol).  There will always be the hacker types and those who think they
have value-add to give.  But even if we grant this scenario -- that
everybody uses proven implementations -- we still have another problem.
Should an application ever create the XML that needs to be delivered, it the
protocol stack would have to reparse that XML on the client-side before
sending it, just to provide the robustness guarantee.  That's another
performance issue.  I seriously doubt that all protocol clients are going to
enforce the reliability of the XML.

(C) Another objection: We will have a better world if the server
infrastructure enforces wellformedness on behalf of the applications.  This
reduces the work of the applications, and it pushes error detection and
error handling closer to its source.

Response: I agree in principle with this objection, but I disagree that XML
allows us to do this properly, and I disagree that all protocol stacks
should be required to provide this enforcement.  XML does not allow us to do
this properly because the first wellformedness error is officially required
to be a fatal error, by the XML 1.0 spec itself.  There is no standard
handling for fatal errors -- no standard way to identify the kind of error
or even a requirement that the kind of error be identifiable.  When such an
error occurs, many XML parsers will force the recipient to just reject the
message.  The recipient may not be able to log the message headers, even if
they are wellformed (as I assume) or inform the target application, should
the target application need to engage in error management.  If the message
contains is a batch of independent commands (I don't mean independent
messages), the application won't be able to handle the wellformed ones
independently.

(D) Another objection: An XML envelope gives us the advantage of
extensibility that we wouldn't otherwise have.

Response: I see this "extensibility" as extensibility of the headers, not of
the payloads.  MIME is pretty flexible with its payloads, allowing heirarchy
and arbitrary content.  The issue is attributing the message as a whole and
attributing the individual payloads.  One could address the extensibility of
the message headers by having an XML document represent the headers.  This
is done in ebXML and RNIF (RosettaNet).  One could address the extensibility
of the payload headers through an XML manifest or an optional XML attachment
that may prefix any given payload.  I'm sure other possibilities exist as
well.

(E) Another objection: The W3C decided not to do XML packaging, but XML
packaging confers enough benefits that it ought to be addressed to some
extent, even if only minimally.  XML Protocol is the right place for that.
Benefits include the ability to apply XML tools to the packaged message;
MIME tools do not exist in such diversity.

Response: The tools argument works for me to some degree, but in my mind the
negatives outweight the positives.  XML packaging (of the sort being
discussed for a protocol) needs a lot of thought and can't be considered
without first understanding the negatives.

Also, I don't find it relevant to our discussion that the W3C decided to
abandon XML packaging.  I don't really know what that working group meant by
"XML packaging."  The issue is that SOAP (and perhaps XML Protocol) is
defining an XML envelope for packaging application payloads, even if only
the XML payloads.  The term 'packaging' may be overloaded, so let's focus on
the semantics rather than the words.

And since, as so far speced, the XML envelope can only handle the XML
payloads, it seems that we aren't simply making message packages more
amenable for use by XML tools.  We are now requiring that both MIME tools
and XML tools be present.  This is more onerous than requiring the presence
and use of just one tool set (or API set).

======

Okay, enough of that.  If you got this far, thank you very much for giving
me your time.  I'll live with whatever the working group comes up with; I
just wanted to make sure that the issues are heard and known and that the
decisions made are fully educated ones.

Joe Lapp
Principal Architect
webMethods, Inc.

P.S. Randy Waldrop is our official working group member and will be formally
representing webMethods' interests.  I don't plan on participating in this
discussion (too much else to do), and Randy is free to defend or attack or
ignore these issues as he pleases.  This group has a huge amount of
expertise, and I trust that you will make appropriate use of these points,
whatever use that may be. 

Received on Tuesday, 17 October 2000 21:03:28 UTC