Re: On subsetting XML...

Tim Bray writes:

> wrote:
> > I think the major practical implication is that SOAP messages are 
> > limited in their ability to carry arbitrary XML documents fragments as 

> > Body or Header data.  XML is sufficiently broken in its ability to 
> > such arbitrary XML (conflicting entity definitions and inability to 
> > nested DOCTYPE, for example) that we never could have achieved general 

> > container support in any case.  As it happens, the semantics of a PI 
in a 
> > SOAP body would be very questionable.  Does it apply only to the body, 
> > to the SOAP message as a whole?
> This line of argument is bogus. 

Fair enough :-)

> PIs are explicitly stated to be messages 
> to a particular application, which means 
> that the application should identify how it wants to be
> addressed and what messages it's interested in

So, we start by agreeing that SOAP itself chooses not
to define any specific interpretation for any PI that
would be addressed "to SOAP".

> , and can safely ignore anything that isn't intended
> for it (which I assume every SOAP implementation
> would). 

This is where I don't think the story is so simple.
Viewed "looking up" from the XML level, there are (at
least) two levels of "application" sharing the SOAP
envelope: (1) the SOAP processing model which in turn
defers interpretation of certain subtrees such as the
body to (2) an application of SOAP.  So, one question
is, if I see a SOAP Envelope which looks something like

    <? AAA BBB CCC ?>
    <n1:yourHeader1 role=intermed1">
    <? XXX YYY ZZZ ?>
    <n2:yourHeader2 role=intermed2">

what do I do at an intermediary where the first header
is processed?  The SOAP model says that when the
message is received at intermed1, n1:yourHeader1 will
be processed, and all things being equal that header
will be removed before the message is relayed to
intermed2.  Which, if any, of the PIs should be
removed?  Note that if none are removed, the resulting
message is:

    <? AAA BBB CCC ?>
    <? XXX YYY ZZZ ?>
    <n2:yourHeader2 role=intermed2">

Depending on the semantics of the PIs, it's not
impossible that a PI originally meant to modify header1
is now erroneously retained in the message, perhaps
affecting header 2 or maybe even the body.  These 
are the sorts of questions we don't have to settle
if we just say:  no PIs.  In that sense, this is
a "Keep it Simple" decision.

So, I'm not sure that the semantics of
"ignoring what you don't understand" are
as simple when you consider a system that
manipulates content as it flows through
the system.

> The XMLP arguments for not wanting DTDs were
> very convincing, the arguments on PIs entirely
> unconvincing.  Just my opinion of course. -Tim

I agree that it's not an entirely clear-cut call
and that the case for no DTDs is more compelling.
I just don't think the "no PI" call is so obviously
evil as some seem to imply.  Allowing them would
certainly add some complication to our processing
model.  It might or might not also have impacts
on description languages (should WSDL allow you
to control which PIs are acceptable in a message?) 


Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142

Received on Thursday, 16 January 2003 17:46:11 UTC