W3C home > Mailing lists > Public > www-tag@w3.org > April 2003

Re: [xmlProfiles-29] xml subsetting in IETF XMPP

From: <noah_mendelsohn@us.ibm.com>
Date: Wed, 2 Apr 2003 09:58:59 -0500
To: robin.berjon@expway.fr
Cc: tbray@textuality.com, www-tag@w3.org
Message-ID: <OFFFDC97AB.0130B990-ON85256CFC.0051B635@lotus.com>

Robin Berjon writes:

> Imho it only looks, walks, and quacks like a subset if
> sending some of the excluded tokens generates an error,
> ie if general-purpose XML has a chance of blowing up
> when it reaches the other side.

Just to set the record straight on SOAP:  no correct implementation of the 
SOAP HTTP binding will ever send a PI, DTD etc.  This is because the 
original message is modeled as an infoset and such SOAP infosets by 
definition do not contain PIs, DTDs, etc.  (just as they by definition 
don't contain zoo:animail attributes on the envelope element). 

At the sending end:  it's assumed that your software allows you to 
faithfully send such an Infoset.  I suspect this is one source of Tim 
Bray's concern:  one can certainly imagine middleware software that would 
take the liberty to stick in DTDs or PIs that were in some sense not 
specifically suggested by a sending application.  I don't see the XML 
recommendation as weighing on such software one way or the other.  Such 
software would indeed be inappropriate at a SOAP sender:  you need 
software that lets you prepare an infoset and serialize it as XML 1.0.  If 
that means we've defined a subset, I suppose we have, but I'm not at this 
point convinced.

At the receiving end:  unlike XMPP, SOAP considers PIs and DTDs as errors, 
because they are prima facie evidence that you are talking to a buggy 
sender.  Again, SOAP is silent on how you build the software to detect 
such error2.  You can use a general purpose parser and put above it a 
layer that checks for PIs and DTDs (and zoo:animal attributes), or you can 
build a special-purpose SOAP scanner.  In the former case, you must use a 
parser that accurately reflects (at least) the received Infoset and also 
the presence of any DTD information that is not reflected in the Infoset. 
These seem to be no more rigorous than the requirements for a parser used 
in an XML editor....indeed, some of those must accurately reflect single 
and double quotes, and other serialization details.  Again, I suspect 
Tim's preference would be that the presence of DTDs, PIs, etc. be viewed 
as details that need not in all cases be reflected by a parser...as with 
an editor, SOAP is an application of XML for which such a parser would be 

> On the other hand if it is defined so that the
> receiving end MUST parse the XML correctly, but MUST
> ignore it (ie MUST NOT pass it on to the application so
> that no semantic value whatsoever can ever be attached
> to those tokens) then we have a usage convention. It
> reads general-purpose XML, it just doesn't extract the
> same information out of it. Given that we have no data
> model, a parser that exposes less data than another is
> not a subset parser.

Again, SOAP is different in this respect, for the reasons described above. 
 All of that said, I see nothing that would break if we switched to the 
XMPP receiver rules, and quietly flushed buggy input, thereby defining it 
as meaningless but not erroneous.  We could also go with a SHOULD fault or 
MAY fault, which would allow discretion to detect it as an error.

I respect and understand the reasons that Tim believes we have, however 
unintentionally, defined a subset of XML.   To some degree it's a matter 
of terminology.  Not speaking officially for the WG, I would reiterate 
that I don't think we thought we were doing  a subset.  I don't think we 
ever asked:  should others use this same subset?   As I suspect is the 
case with XMPP, we just used XML in a way that seemed appropriate to our 

BTW:  I think I've now made clear my understanding of what SOAP has done 
and how it compares to XMPP.  In the interest of avoiding list overload, I 
tentatively plan to remain quiet on this thread for the forseeable future, 
unless new information shows up or specific questions are raised for which 
I might have the answer.  Thank you!

Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
Received on Wednesday, 2 April 2003 10:06:36 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:32:37 UTC