- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 25 Nov 2002 22:58:51 -0500
- To: Paul Grosso <pgrosso@arbortext.com>
- Cc: www-tag@w3.org, fallside@us.ibm.com
I'm curious, was this raised as a last call issue for SOAP? I don't
recall seeing it. The prohibition of internal subsets has, as I recall,
been in every SOAP working draft since day 1, and certainly was in the
last call draft. With respect, from a process point of view, I find it
somewhat unfortunate that this issue is rasied to the TAG before or
instead of raising it through the normal workgroup feedback mechanisms.
The fact is that there are advantages as well as disadvantages to SOAP's
decision to disallow the internal subset, and as one who has built SOAP
implementations I can tell you that the performance implications of
dealing with the internal subset would be significant for the sorts of
applications and performance regimes that my employer (IBM) anticipates.
General purpose XML processors are only sometimes the right design point
for consumers of XML. Try handling hundreds or thousands of messages per
second while doing all the dynamic buffer management implied by parsing
internal subsets and doing entity substitution and you will find that
there is a real cost to allowing it. There are also some denial of
service attacks that are possible with entities, though presumably
heuristics can be used to limit their impact.
As has been observed, all the XML produced by the SOAP HTTP binding is
fully XML conformant and is processable by standard processors if you
like...what a standard processor may not do is detect all uses of XML that
are illegal SOAP, but you'll always have lots of SOAP-specific checking to
do in any case. I suspect that standard processors will in general be
significantly slower than what you will find over time in highly tuned
SOAP implementations. In any case, the Protocols WG made a conscious
decision to enable such optimizations. The fact is that doing high
performance message processing using a technology like XML (text based,
variable offset) is in some ways a stretch. I see the glass as half full:
by making a few sensible compromises, SOAP ensures that every conformant
SOAP message is legal XML, which I think is a big step forward from the
binary alternatives. I would only want to see internal subsets, etc.
reintroduced if we can demonstrate that the result is in fact practical
for its intended uses.
I don't deny that this is an issue with (at least) two sides, but here I
am concerned mainly about W3C process. I do understand that there is
potentially an architecture issue here as well, but I would think that the
most useful input to the TAG would come as a summary of whatever emerged
in a discussion between the protocols WG and those who might question its
decisions regarding the internal subset. As far as I know, that issue
wasn't raised and the discussion didn't happen. It is late in the SOAP
review process, but if anything at all is to be done in reopenning this
issue, I think it should start with the protocols workgroup and not the
TAG. Thank you!
------------------------------------------------------------------
Noah Mendelsohn Voice: 1-617-693-4036
IBM Corporation Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Paul Grosso <pgrosso@arbortext.com>
Sent by: www-tag-request@w3.org
11/25/02 12:50 PM
To: www-tag@w3.org
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: SOAP's prohibiting use of XML internal subset
Categories:
One of the design decisions/goals of the XML 1.0 Recommendation [1]
was to have as few optional features as possible [2]. XML 1.0 allows
an XML document to have a prolog that includes some declarations in
what is called the internal subset [3].
An important class of XML documents are those that are "standalone" [4].
In such documents, the only way to provide entity declarations [5] or
attribute defaults [6] is to put such declarations in the internal subset.
It is my understanding that the Last Call draft of SOAP 1.2 [7] makes
use of an XML format that does not permit any internal subset, despite
the fact that XML 1.0 does not define such a profile/subset of XML. I
wonder what the definition of such profiles by individual specifications
will do for interoperability.
For a case in point, the XML Core WG has been asked to address the
issue of how to declare "character entities." Our answer (see [8])
is that the way to declare such entities in XML is to use the
internal subset, an integral part of XML 1.0 that must be supported
by all compliant XML processors. The fact that this solution doesn't
work for SOAP has not overridden the XML Core WG's reluctance to consider
development and endorsement of new XML syntax to support what is already
supported in XML 1.0. However, we recognize that the current situation
means that the use of entities and attribute defaults is not available
to SOAP users.
Is this an architectural issue that the TAG wishes to address?
I am writing this message in a personal capacity, as I have not discussed
this particular message with the XML Core WG (though at least parts of the
issue have been discussed in the WG, and there are clearly parts of the
issue
that do touch on XML Core WG work).
I would be interested in hearing any comments the TAG might have on
this situation.
paul
[1] http://www.w3.org/TR/REC-xml
[2] http://www.w3.org/TR/REC-xml#sec-origin-goals point 5
[3] http://www.w3.org/TR/REC-xml#dt-doctype
[4] http://www.w3.org/TR/REC-xml#sec-rmd
[5] http://www.w3.org/TR/REC-xml#sec-entity-decl
[6] http://www.w3.org/TR/REC-xml#sec-attr-defaults
[7] http://www.w3.org/TR/soap12-part0/ and others
[8] http://www.w3.org/XML/Core/2002/10/charents-20021023
Received on Monday, 25 November 2002 23:01:42 UTC