Re: [xmlProfiles-29] xml subsetting in IETF XMPP

On Wed, Apr 02, 2003 at 10:46:44AM +0200, Robin Berjon wrote:

> Tim Bray wrote:
> >Noah Mendelsohn/Cambridge/IBM wrote:
> >>(I still prefer the term "usage convention" to "subset"
> >
> >I don't.  Let's call a spade a spade.  SOAP/XMLPP have created an 
> >incompatible subset of XML such that general-purpose XML generators 
> >cannot reliably be used to generate their messages, and general-purpose 
> >XML procedssors cannot reliably be used to receive them.  It looks like 
> >a subset, walks like a subset, quacks like a subset.
> 
> Imho it only looks, walks, and quacks like a subset if sending some of the 
> excluded tokens generates an error, ie if general-purpose XML has a chance 
> of blowing up when it reaches the other side.
> 
> On the other hand if it is defined so that the receiving end MUST parse the 
> XML correctly, but MUST ignore it (ie MUST NOT pass it on to the 
> application so that no semantic value whatsoever can ever be attached to 
> those tokens) then we have a usage convention. It reads general-purpose 
> XML, it just doesn't extract the same information out of it. Given that we 
> have no data model, a parser that exposes less data than another is not a 
> subset parser.
> 
> It is my understanding that this is the approach taken by the XMPP folks. 
> I'm not saying it's devoid of potential problems (notably wrt entity 
> handling) but those seem to be technicalities.

Perhaps it will help if I attempt to clarify how XMPP uses XML, since it
is rather non-standard. XMPP is a protocol for streaming XML elements.
The usual scenario is for a client to connect to a server over TCP and
establish a stream. The <stream/> element can be seen as the document
root, if you will. Once a stream is established, the server opens a 
stream back to the client as well. So we have two uni-directional
streams, one from client to server and one from server to client. Once
these streams are appropriately authenticated and secured (via SASL
and TLS), the client can send any number of XML elements over the 
stream. In XMPP these elements are limited to those defined by the 
jabber:client namespace (which is the default namespace on the 
stream): namely, the <message/>, <presence/>, and <iq/> elements 
(which along with appropriate extensions implement the functionality 
expected of an instant messaging application). So you may view an IM 
session as consisting of two "documents" which are built up over the 
life of the session:

<stream>
  <presence/>
  <message/>
  <iq/>
  ...
</stream>

However, viewing a stream as a document is merely a matter of
convenience for those who are accustomed to thinking in terms of XML
documents; in reality, XMPP does not deal with documents at all, but
rather with XML streams and what we call "XML stanzas" (i.e., any 
direct child of the stream root).

Because XMPP is a simplified protocol for streaming XML elements, as
opposed to parsing complete documents, we have no need for internal 
or external DTD subsets (the structure is defined by the schema of 
the jabber:client namespace). Because we have no DTDs, there is no 
reason to support general or parameter entity references. Because we 
are not processing documents, there is no reason to support comments. 
Because XMPP contains its own mechanisms for interacting with 
applications (mainly the <iq/> element, which is a request-response 
mechanism for doing things like retrieving data, e.g. a contact list, 
from the server), there is no reason to support processing 
instructions. 

From the application-centric rather than document-centric perspective 
of XMPP, all of the above restrictions are reasonable. But they are 
reasonable only *in our context*. We are not trying to create an
official subset of XML or to force these restrictions on anyone else. 
However, we do require that an XMPP application MUST NOT inject into 
an XML stream any of the foregoing restricted XML, and that if an 
XMPP application receives such XML is MUST ignore that data (I have 
strengthened the text from SHOULD to MUST here, and that will be 
reflected in draft-ietf-xmpp-core-07).

I hope this helps to clarify the ways in which XMPP uses XML.

If there are no objections, until such time as consensus is reached 
regarding the appropriate terminology (subset, profile, usage 
convention, conformance class), the XMPP documents will continue to
speak merely of "restrictions" regarding acceptable XML within the 
context of XMPP.

Peter

-- 
Peter Saint-Andre
Jabber Software Foundation
http://www.jabber.org/people/stpeter.php

Received on Wednesday, 2 April 2003 11:43:51 UTC