RE: Issues with Packaging Application Payloads from James Snell on 2000-10-18 (xml-dist-app@w3.org from October 2000)

From: James Snell <jsnell@lemoorenet.com>
Date: Tue, 17 Oct 2000 22:09:55 -0700
To: <xml-dist-app@w3.org>
Message-ID: <FNEFKGCOPGCFLMCBIIIBIEFECDAA.jsnell@lemoorenet.com>
Andrew brings up some very valid points here... points that I hope will be
addressed in the upcoming xp requirements document.  Where exactly do we
draw the lines between the packaging protocol and the API that utilizes that
protocol?

One particular example of this issue in regards to SOAP is the SOAPAction
HTTP Header.  SOAP declares that it is part of the packaging, but does not
declare exactly how to use it.

Encoding XML is another issue, obviously... SOAP declares that the envelope
can be used to encapsulate arbitrary namespace qualified XML but does not
declare exactly how to go about making sure that that "arbitrary XML" is
valid in any way.

A third issue has to do with the actor attribute and message paths,
particularly in a request/response interaction.  The SOAP specification
states only that the actor attribute is used to determine who gets to play
with the header next, but goes into absolutely no detail about exactly how
to pass the messages back and forth, or whether or not a request/response
interaction must return along the same path it was sent (i.e. request:
a->b->c    response:c->b->a).

Several questions must be answered:

1. In a PACKAGING protocol, is it appropriate to define extensions to
TRANSPORT protocols that may be used to carry it if the packaging protocol
is supposed to be transport protocol agnostic?  If so, when is it
appropriate to do so?

2. In a PACKAGING protocol specification, is it appropriate to dictate that
the implementation API is responsible for ensuring validity of the package
or should there be built in controls in the packaging itself.

3. In a PACKAGING protocol, to what extent is it appropriate to specify
special use case scenarios (such as the actor and message paths) within the
core protocol specification?  Additionally, to what extent is the protocol
specification responsible for defining the implementation of that case
scenario? In other words: Is it right that the SOAP specification does not
go into additional detail about how to implement message paths?


Another question that I must ask:  is it the intention of this working group
to not only define the PACKAGING structure of the XML Protocol but also a
standard API for implementing that PACKAGING as has been done with XML and
the DOM?  Or is this working group only going to focus on the XML Packaging?

- James


  -----Original Message-----
  From: xml-dist-app-request@w3.org [mailto:xml-dist-app-request@w3.org]On
Behalf Of Andrew Layman
  Sent: Tuesday, October 17, 2000 6:03 PM
  To: xml-dist-app@w3.org
  Subject: RE: Issues with Packaging Application Payloads


  Thanks for the thoughtful mail.  I have some ideas on some of the points
you raise, though I don't suggest that these are exhaustive.

  1.    You raise the point that an application might assemble what it hopes
is a valid XML/SOAP message but inadventantly include some character data
that renders the document invalid.  This is true, but I don't think the
problem is limited to SOAP or even to XML.  If one has any sort of an
envelope or syntactic structuring, whether XML, MIME etc., then the
divisions between parts will need to be marked somehow.  If the marking is
by way of separator or terminator sequences (as opposed to length prefixes,
which have other problems) then it is always possible that an application
might include data that conflicts with the structural markers.
      From the point of view of a protocol spec, I think the best one can
say is that the application should not do this.  Similarly, the HTTP
specification does not discuss how to deal with messages that are not HTTP,
only those that are HTTP.
      Once could certainly design APIs that make it easier for applications
to ensure that they have not accidently created invalid messages.  This is
not per se part of the protocol.  Similarly, servers will always want to
ensure that they do not fail even if handed invalid would-be XML, or stuff
that is not even remotely XML.  I believe that SOAP 1.1 provides enough
description of Fault reporting to report errors.

  2.    It has indeed been one of the design decisions of XML to use
"Draconian" error handling, to stop processing if the parsed document is not
XML.  This was a conscious decision that rated interoperability and validity
as more important than authoring convenience.  SOAP continues that approach,
believing that it has merits and is consistent with the goals of XML
generally.  Naturally, that does mean that some documents that are "almost
valid" will be rejected.

  3.    If I understand you, the perfomance problem you cite is more an
issue of the suitability of a DOM-based message handler for large messages.
Whether SOAP, or some other use of XML, or even if using MIME or something
else, if an application processes large messages by first parsing them into
a tree or other buffer and then examining the contents, that will be more
expensive than some more streamlined techniques.

  I think that one of the themes running through the above comments is that
I'm trying to separate the issues that relate to the protocol from other
issues that relate to good application or API design.  The protocol spec is
more like a spec for XMLthan a spec for a browser.  There are better and
poorer browsers, and many differences of taste, and I expect that there will
be better and poorer application support libraries.

  By the way, thank you for the kind compliments on SOAP.

  --Andrew Layman
    -----Original Message-----
    From: Joe Lapp [mailto:jlapp@webmethods.com]
    Sent: Tuesday, October 17, 2000 4:27 PM
    To: xml-dist-app@w3.org
    Subject: Issues with Packaging Application Payloads


    On August 18th I posted to the SOAP discussion list a set of issues that
I had with how SOAP packages application payloads.  Most of these issues
apply to the use of an XML envelope, so this group will face these questions
in the development of W3C XML Protocol.  Since the group is in the
requirements phase, I thought it best to make sure the issues are known, so
that they can help feed requirements.

    By presenting this I don't mean to give any statement of webMethods'
position on either SOAP or XML Protocol.  webMethods has a habit of
supporting whatever protocols our clients need.  My job as engineer is to
make what it means to support a protocol as painless as possible to both
webMethods developers and those clients of ours who are inclined to use the
protocol.

    You'll find that 8/18/2000 post duplicated below, in the <REPOST> tag.
Afterwards, I address some of the common objections.

    <REPOST>

    I'm partly encountering and partly anticipating a number of issues
related to how SOAP packages application payloads in XML documents. These
issues primarily apply to the use of SOAP with application-level payloads
and do not surface when SOAP is used strictly for RPC. I'm providing the
issues here to make them public and available for discussion, but I think it
would be most effective to resolve the issues within a standards body that
is generous enough to bring SOAP under its wing.

    You'll find the issues listed below. Please feel free to provide
corrections to any erroneous understanding I may have and suggestions for
how to deal with some of these situations. Here you go:

    <ISSUES>

    (1) Infrastructure header data and application payload data may be put
in the same XML document (the SOAP Envelope). When applications
unintentionally dump non-wellformed XML payloads into this document, the
entire document is non-wellformed. A robust server must protect itself from
client errors and cannot trust clients to deliver only wellformed documents.
Should an application or protocol choose to put payloads in the XML
envelope, it seems that this would create a number of problems related to
error handling:

    (1a) Countless XML tools out there use parse-trees (instead of events)
and may not be capable of representing just the wellformed portion of a
document. Middleware based on such tools will not be able to act on the SOAP
headers that would otherwise apply, even for some content errors that are
not wellformedness errors. Likewise, recipient applications will not be able
to engage in the error management required by those applications for
application-level errors.

    (1b) Even event-based XML tools suffer from not being able to reliably
deliver application payloads that are wellformed but which follow a
non-wellformed payload (within the body of the same XML envelope). Errors
that may be recoverable at the application level are not given the
opportunity to recover. Errors that may be ignored at the application level
will not be ignored. For example, should the application payloads be
semantically independent, as with an application-level batching mechanism,
SOAP would create dependencies between them.

    Issues (1a) and (1b) are a direct result of the XML 1.0 Specification's
requirement that wellformedness errors be fatal errors within an XML
processor. In particular, the specification says that "Once a fatal error is
detected, however, the processor must not continue normal processing (i.e.,
it must not continue to pass character data and information about the
document's logical structure to the application in the normal way)." To
relate this to SOAP, whenever an application payload is placed into the
envelope body, an assertion is made that the payload constitutes wellformed
XML, and the SOAP envelope ends up creating dependencies among data that did
not exist prior to enveloping the data. This violates the clear separation
of layers that most protocols have.

    (2) Performance is another issue that seems to surface when
infrastructure and application data are put into the same XML document.
Consider SOAP's requirement that when a header targeted for a given actor
reaches that actor, the header must be consumed (usually meaning removed)
from the document before the document may proceed. If the SOAP header is 4K
in size and the SOAP body has a 1MB payload, you wouldn't want to parse the
document, remove one element from the beginning of the document, and then
regenerate the 1+MB document to forward to the next destination. Yet most
tools will require this approach -- even most event-based tools. It is
possible to create a more specialized parser that will provide a portion of
the document as XML and still allow concatenating the remainder as text
without going through the parsing process, but that's kind of a tall order
for the everyday parser we hope will work with SOAP.

    (3) SOAP does not sufficiently address the issues associated with mixing
MIME-based packaging and XML-based packaging. I have identified four
distinct locations in a SOAP message where application XML-based data could
be placed: in the envelope body, in the envelope header, as an immediate
child of the envelope root, and as a MIME part following the envelope.
Section 4.3.1 of SOAP 1.1 asserts a "semantic" equivalence between payloads
and headers under certain circumstances, which is what allows an application
payload to appear in the header. Since these two payload areas are
semantically equivalent at the SOAP level, SOAP need not provide a
distinction between these payloads at the application level. However, the
question remains as to whether SOAP provides applications with the ability
to distinguish among the remaining payload locations.

    Should SOAP allow the application to know or select the envelope in
which each payload is packaged? Is it reasonable to expose this packaging
detail at the application level? Can an application distinguish between MIME
headers and XML attributes (eg. encodingStyle) when examining or specifying
per-payload properties, or would applications also require an awareness of
these distinctions?

    The specification "SOAP Messages with Attachments," by John J. Barton
(Hewlett Packard Labs) and Satish Thatte (Microsoft), does an excellent job
of specifying how one puts MIME attachments in a SOAP message along with
some of the interactions between payloads, but it does not address these
sorts of envelope-transparency issues. (This specificiation was posted to ht
tp://discuss.develop.com/soap.html on July 7, 2000, but the attachment does
not seem to be available from the archive.)

    (4) Many RFCs and standards have been created to specify how one uses
MIME headers to package MIME parts for a particular purpose. Some specify
document types, some specify character encodings, some specify that the data
is encrypted, some specify the presence of signatures or certificates needed
to interpret the message -- in general any information needed for middleware
software to communicate payload-specific handling. If we put data in the
body portion of the envelope, we forsake all the benefits available through
tools that implement these standards. Is it reasonable to put payloads in
one place only when those standards aren't needed and in the other when they
are needed? SOAP adds the encodingStyle attribute to each payload. Won't we
also sometimes need that with a MIME part?

    </ISSUES>

    Okay, assuming that my issues are substantially correct (which may be a
completely false assumption), what benefits are there to allowing an XML
packaging mechanism at all in SOAP for application payloads? Why not just
define SOAP in a packaging-independent way and provide a binding for MIME
(or its HTTP variant) for now, until the W3C produces a well-thought-out
specification for XML packaging? The only answer I can think of is a good
one, but not one that stands up to the requirements of high-end B2B
ecommerce: that life is easier for developers if they don't have to learn
and work with yet one more technology -- MIME.

    Joe Lapp
    Principal Architect
    webMethods, Inc.

    P.S. This email was substantially critical of SOAP, or at least of
SOAP's packaging of application payloads in XML, so I want to end by saying
that SOAP 1.1 is the most impressive XML specification I've had the pleasure
to read. I have referred many people to it to use as a model for their own
XML specifications (even including some Microsofties). It is very easy to
read, very easy to understand, and very succinct. I'm impressed with the
extensibility mechanism created for the SOAP header -- I'm especially
impressed that the designers understood that the X in XML does not by itself
give them extensibility. I'm most impressed with how little documentation is
required to specify a working messaging protocol. ebXML and RosettaNet RNIF
have much to learn from the SOAP and BizTalk specifications.

    </REPOST>

    Okay, let's look at some of the objections I have received:

    (A) I say that these issues apply more to non-RPC uses of the SOAP
protocol than to RPC uses, and one objection I've heard is that the SOAP
envelope doesn't distinguish between the two, so that if the issues don't
exist for one mode they shouldn't exist for the other.  I have a few
responses to this "objection":

    First, the issues stand by themselves, so evaluate for yourself whether
or not they apply to RPC.  The next two responses just explain why I
bothered to make this assertion.

    Second, the SOAP spec defines the RPC behavior but not the other
application-specific behavior.  To be SOAP-compliant requires conformance
with the SOAP spec, so by definition, RPC should behave well between two
SOAP-compliant nodes.  If the nodes aren't SOAP compliant, then you wouldn't
necessarily expect communication anyway.

    Finally, the RPC semantics and marshalling can be made independent of
the application either by using an introspective language like Java or by
using a single tool for generating the stubs of all (or most applications).
There will be far more applications than infrastructure software and
stub-generators, so it will be easier to ensure that the infrastructure and
stub-generators behave well, but near impossible to ensure that all
applications behave well.

    (B) Another objection: The software infrastructure will be responsible
for transmitting wellformed XML.  Most of the XML will therefore be
wellformed, and these issues will not arise frequently enough to bother
addressing them.

    Response: Note that this is only an objection to point (1) and makes no
statement regarding points (2) through (4).

    I believe that this objection makes a false assumption.  First, I doubt
that everybody will be using the same robust implementations of SOAP (or XML
Protocol).  There will always be the hacker types and those who think they
have value-add to give.  But even if we grant this scenario -- that
everybody uses proven implementations -- we still have another problem.
Should an application ever create the XML that needs to be delivered, it the
protocol stack would have to reparse that XML on the client-side before
sending it, just to provide the robustness guarantee.  That's another
performance issue.  I seriously doubt that all protocol clients are going to
enforce the reliability of the XML.

    (C) Another objection: We will have a better world if the server
infrastructure enforces wellformedness on behalf of the applications.  This
reduces the work of the applications, and it pushes error detection and
error handling closer to its source.

    Response: I agree in principle with this objection, but I disagree that
XML allows us to do this properly, and I disagree that all protocol stacks
should be required to provide this enforcement.  XML does not allow us to do
this properly because the first wellformedness error is officially required
to be a fatal error, by the XML 1.0 spec itself.  There is no standard
handling for fatal errors -- no standard way to identify the kind of error
or even a requirement that the kind of error be identifiable.  When such an
error occurs, many XML parsers will force the recipient to just reject the
message.  The recipient may not be able to log the message headers, even if
they are wellformed (as I assume) or inform the target application, should
the target application need to engage in error management.  If the message
contains is a batch of independent commands (I don't mean independent
messages), the application won't be able to handle the wellformed ones
independently.

    (D) Another objection: An XML envelope gives us the advantage of
extensibility that we wouldn't otherwise have.

    Response: I see this "extensibility" as extensibility of the headers,
not of the payloads.  MIME is pretty flexible with its payloads, allowing
heirarchy and arbitrary content.  The issue is attributing the message as a
whole and attributing the individual payloads.  One could address the
extensibility of the message headers by having an XML document represent the
headers.  This is done in ebXML and RNIF (RosettaNet).  One could address
the extensibility of the payload headers through an XML manifest or an
optional XML attachment that may prefix any given payload.  I'm sure other
possibilities exist as well.

    (E) Another objection: The W3C decided not to do XML packaging, but XML
packaging confers enough benefits that it ought to be addressed to some
extent, even if only minimally.  XML Protocol is the right place for that.
Benefits include the ability to apply XML tools to the packaged message;
MIME tools do not exist in such diversity.

    Response: The tools argument works for me to some degree, but in my mind
the negatives outweight the positives.  XML packaging (of the sort being
discussed for a protocol) needs a lot of thought and can't be considered
without first understanding the negatives.

    Also, I don't find it relevant to our discussion that the W3C decided to
abandon XML packaging.  I don't really know what that working group meant by
"XML packaging."  The issue is that SOAP (and perhaps XML Protocol) is
defining an XML envelope for packaging application payloads, even if only
the XML payloads.  The term 'packaging' may be overloaded, so let's focus on
the semantics rather than the words.

    And since, as so far speced, the XML envelope can only handle the XML
payloads, it seems that we aren't simply making message packages more
amenable for use by XML tools.  We are now requiring that both MIME tools
and XML tools be present.  This is more onerous than requiring the presence
and use of just one tool set (or API set).

    ======

    Okay, enough of that.  If you got this far, thank you very much for
giving me your time.  I'll live with whatever the working group comes up
with; I just wanted to make sure that the issues are heard and known and
that the decisions made are fully educated ones.

    Joe Lapp
    Principal Architect
    webMethods, Inc.

    P.S. Randy Waldrop is our official working group member and will be
formally representing webMethods' interests.  I don't plan on participating
in this discussion (too much else to do), and Randy is free to defend or
attack or ignore these issues as he pleases.  This group has a huge amount
of expertise, and I trust that you will make appropriate use of these
points, whatever use that may be.
Received on Wednesday, 18 October 2000 01:12:44 UTC