W3C home > Mailing lists > Public > www-tag@w3.org > November 2002

Re: SOAP's prohibiting use of XML internal subset

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 25 Nov 2002 22:58:51 -0500
To: Paul Grosso <pgrosso@arbortext.com>
Cc: www-tag@w3.org, fallside@us.ibm.com
Message-ID: <OF5F13D1B3.6BE9227F-ON85256C7D.0013E7AB@lotus.com>

I'm curious, was this raised as a last call issue for SOAP?  I don't 
recall seeing it.  The prohibition of internal subsets has, as I recall, 
been in every SOAP working draft since day 1, and certainly was in the 
last call draft.  With respect, from a process point of view, I find it 
somewhat unfortunate that this issue is rasied to the TAG before or 
instead of raising it through the normal workgroup feedback mechanisms.

The fact is that there are advantages as well as disadvantages to SOAP's 
decision to disallow the internal subset, and as one who has built SOAP 
implementations I can tell you that the performance implications of 
dealing with the internal subset would be significant for the sorts of 
applications and performance regimes that my employer (IBM) anticipates. 
General purpose XML processors are only sometimes the right design point 
for consumers of XML.  Try handling hundreds or thousands of messages per 
second while doing all the dynamic buffer management implied by parsing 
internal subsets and doing entity substitution and you will find that 
there is a real cost to allowing it.  There are also some denial of 
service attacks that are possible with entities, though presumably 
heuristics can be used to limit their impact. 

As has been observed, all the XML produced by the SOAP HTTP binding is 
fully XML conformant and is processable by standard processors if you 
like...what a standard processor may not do is detect all uses of XML that 
are illegal SOAP, but you'll always have lots of SOAP-specific checking to 
do in any case.   I suspect that standard processors will in general be 
significantly slower than what you will find over time in highly tuned 
SOAP implementations.  In any case, the Protocols WG made a conscious 
decision to enable such optimizations.  The fact is that doing high 
performance message processing using a technology like XML (text based, 
variable offset) is in some ways a stretch.  I see the glass as half full: 
 by making a few sensible compromises, SOAP ensures that every conformant 
SOAP message is legal XML, which I think is a big step forward from the 
binary alternatives.  I would only want to see internal subsets, etc. 
reintroduced if we can demonstrate that the result is in fact practical 
for its intended uses. 

I don't deny that this is an issue with (at least) two sides, but here I 
am concerned mainly about W3C process.  I do understand that there is 
potentially an architecture issue here as well, but I would think that the 
most useful input to the TAG would come as a summary of whatever emerged 
in a discussion between the protocols WG and those who might question its 
decisions regarding the internal subset.   As far as I know, that issue 
wasn't raised and the discussion didn't happen.  It is late in the SOAP 
review process, but if anything at all is to be done in reopenning this 
issue, I think it should start with the protocols workgroup and not the 
TAG.  Thank you!
 
------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







Paul Grosso <pgrosso@arbortext.com>
Sent by: www-tag-request@w3.org
11/25/02 12:50 PM

 
        To:     www-tag@w3.org
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        SOAP's prohibiting use of XML internal subset
Categories: 
 





One of the design decisions/goals of the XML 1.0 Recommendation [1]
was to have as few optional features as possible [2].  XML 1.0 allows
an XML document to have a prolog that includes some declarations in
what is called the internal subset [3].

An important class of XML documents are those that are "standalone" [4].
In such documents, the only way to provide entity declarations [5] or
attribute defaults [6] is to put such declarations in the internal subset.

It is my understanding that the Last Call draft of SOAP 1.2 [7] makes
use of an XML format that does not permit any internal subset, despite
the fact that XML 1.0 does not define such a profile/subset of XML.  I 
wonder what the definition of such profiles by individual specifications 
will do for interoperability. 

For a case in point, the XML Core WG has been asked to address the 
issue of how to declare "character entities."  Our answer (see [8]) 
is that the way to declare such entities in XML is to use the 
internal subset, an integral part of XML 1.0 that must be supported 
by all compliant XML processors.  The fact that this solution doesn't 
work for SOAP has not overridden the XML Core WG's reluctance to consider 
development and endorsement of new XML syntax to support what is already 
supported in XML 1.0.  However, we recognize that the current situation 
means that the use of entities and attribute defaults is not available 
to SOAP users.

Is this an architectural issue that the TAG wishes to address?

I am writing this message in a personal capacity, as I have not discussed
this particular message with the XML Core WG (though at least parts of the 

issue have been discussed in the WG, and there are clearly parts of the 
issue
that do touch on XML Core WG work).

I would be interested in hearing any comments the TAG might have on
this situation.

paul

[1] http://www.w3.org/TR/REC-xml
[2] http://www.w3.org/TR/REC-xml#sec-origin-goals point 5
[3] http://www.w3.org/TR/REC-xml#dt-doctype
[4] http://www.w3.org/TR/REC-xml#sec-rmd
[5] http://www.w3.org/TR/REC-xml#sec-entity-decl
[6] http://www.w3.org/TR/REC-xml#sec-attr-defaults
[7] http://www.w3.org/TR/soap12-part0/ and others
[8] http://www.w3.org/XML/Core/2002/10/charents-20021023
Received on Monday, 25 November 2002 23:01:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:13 GMT