Re: On subsetting XML...

Paul Grosso asks:

>  what way can *always ignoring* the PIs be any
> different from the case where you never have any?
> 
> paul

I think anytime you allow in a message constructs that
supposedly have no semantic meaning you risk
complicating the model.  This is true even of
seemingly innocent features such as whitespace, but in
the case of SOAP we believe that whitespace contributes
enough to readability, at low enough risk, to be worth
the pain.  PIs buy us no benefit, at higher risk, I think.

To pick one concrete example of possible complication:
what does this do to digital signatures?  You propose
that PIs be allowed but "ignored" by SOAP receivers.
The current W3C canonicalizations retain PIs for
signing, but the signature rec acknowledges that other
canonicalizations may not.[1] So now we have to start
telling stories about which signatures hold and which
are broken when a SOAP intermediary node chooses to drop
the insignificant PI.  Implementations come under pressure
to store and retain the meaningless PIs after all, just
so that signatures won't break, or we have to go to the
trouble or promoting yet another canonicalization 
(which we might do anyway, but this shouldn't be the 
trigger.) 

Even having to worry about such things is the
sort of complication that arises from allowing
constructs in your messages that have no
intended semantic value.  So, that's an example. 
One can also argue that, having allowed PIs, 
users will cheat and give significance to them
in spite of a prohibition in the rec.  Now
interop suffers, as some implementations drop
them, while others depend on them.  If they
aren't going to have their intended positive 
value, they should not be allowed IMO (I realize
this somewhat changes a position I just communicated
to Tim Bray in a private reply...sorry, but this
discussion is reminding me of some of the reasons
for my position.)

> 
> That is, how can SOAP require an XML subset that 
> forbids (i.e., does not include) PIs?

I don't want to be a broken record about this,
but the same question is being asked repeatedly.
SOAP does not "require an XML subset";  like
most XML applications it declines to use
certain XML features.  Contrast with:

<employee populationOfAntartica="blue"/>

This is legal XML, but most personnel applications
I've seen are unlikely to define it as legal at the
application level.  A more likely employee description
vocabulary might look like:

<employee>
        <lastName>Smith</lastName>
        <firstName>Bob</firstName>
        <salary>$1,000,000</salary>
</employee>

which makes no use at all of attributes.  Is such an
application a misuse of XML?  Must we say that all
possible attributes MUST be allowed, but that they are
to be ignored if senseless?  Obviously not.  That's
what we're being asked to say about PIs in SOAP.
So I claim that almost all XML applications use 
a subset of XML.  Editors, databases parsers and
such are the exceptions.  SOAP is an application
that uses the appropriate features of XML, and 
prohibits the use of other constructions.

Surely attributes are intended for general use in XML,
but are not allowed in all vocabularies.  I 
think you and others are implying that PIs are
somehow even more general purpose, but where in the XML
rec does it say that, or give guidance on their use?
They are just another construct for leaving structured
information in the markup.  There is a vague hint in
the XML rec that they may be "directed to an
application."  There is no description of what an
application is or how they are named. 

Within SOAP, PI is seemingly no more useful than the
populationOfAntarctica attribute.  I don't think it
should be required that senders be allowed to put
either into outbound messages, or that receivers ignore
either if received.  Sorry to repeat myself on this,
but the question seems to be coming around repeatedly.
Thanks for your patience with this.

Noah

[1] http://www.w3.org/TR/xmldsig-core/#sec-PI

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------

Received on Thursday, 16 January 2003 22:47:17 UTC