RE: Possibility of an XML Document Type

The two leading options for carrying XML documents appear to be (1) as a
related resources using SOAP+Attachments or (2) some variation on my
proposal to use a datatype.  In analyzing these alternatives, I will admit
to a bias: I think that carrying nested XML documents is a sufficiently
important scenario (e.g. to carry a schema along with a message) that we
want to be able to do it uniformly, and regardless of transport.  With that
unsubstantiated assumption:

I think the suitability of S+A depends in part on your model of "the
message".   If you are willing to believe that the typical message, for
purposes of general modeling, consists not just of an XML envelope, but of
a collection of linked resources, then I don't think you need the datatype.
I think you do then have to position S+A as a core SOAP function, and
require that all bindings support it.  Otherwise, the ability to carry such
messages will depend on the transport.  Furthermore, any software that
queues, stores, or manipulates such messages will be more than XML
software:  it will have to be software that understands the richer message
model, can dereference the links to the resources, and so on.  So far,
we've positioned S+A as an optional layer to be supported in the presumably
unusual cases where you have particularly demanding datatype's (X-Ray
images used to be mentioned quite often). We could make it core, and then
the need for the datatype goes way down.

By contrast, the main disadvantage of my proposed datatype is that it is at
best a suboptimal way to represent an XML document - 30% larger, and double
parsing needed.  Its advantage is that the result is a single XML envelope,
which can be manipulated by ordinary XML tools.  Though certainly you would
hope that software knowledgeable of the datatype would know how to parse
the nested document... I would argue that this is a less structural change
to SOAP than requiring all implementations and bindings to support S+A, and
can be implementation specific.  I agree with those who say that the
datatype could as well be specified by the schema WG as by us, but I think
we may want to briefly discuss the use of that datatype in SOAP messages.

On reflection, another datatype one might consider is one in which the XML
is escaped or CDATA'd, rather than bin64'd.  I haven't thought through all
the implications of that, but I think it works.  Probably more compact, but
you still need the reparse (though, one could imagine a parser that
understood the escaped form directly, I suppose.)  The key point is that
the data is type labeled, so you know it can be parsed as XML.

Anyway, that's how the trade-off looks to me, and I think it's a tough one.
I think the architecturally best solution isn't (for good reason) likely to
happen soon: to rearchitect XML to be able to carry nested instances of XML
documents.  I always thought it was somewhat embarrassing to have a
hierarchical format that could not do this, but I suppose it was not
historically a requirement for SGML.  Lacking that, I'm still tempted to
push for the datatype as a solution that works uniformly over any
transport, and indeed extends beyond SOAP to other applications requiring
XML documents in XML.  No doubt, it is unsuited to large documents or the
most performance critical situations.

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------

Received on Tuesday, 9 October 2001 19:40:31 UTC