- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 25 Nov 2002 22:58:51 -0500
- To: Paul Grosso <pgrosso@arbortext.com>
- Cc: www-tag@w3.org, fallside@us.ibm.com
I'm curious, was this raised as a last call issue for SOAP? I don't recall seeing it. The prohibition of internal subsets has, as I recall, been in every SOAP working draft since day 1, and certainly was in the last call draft. With respect, from a process point of view, I find it somewhat unfortunate that this issue is rasied to the TAG before or instead of raising it through the normal workgroup feedback mechanisms. The fact is that there are advantages as well as disadvantages to SOAP's decision to disallow the internal subset, and as one who has built SOAP implementations I can tell you that the performance implications of dealing with the internal subset would be significant for the sorts of applications and performance regimes that my employer (IBM) anticipates. General purpose XML processors are only sometimes the right design point for consumers of XML. Try handling hundreds or thousands of messages per second while doing all the dynamic buffer management implied by parsing internal subsets and doing entity substitution and you will find that there is a real cost to allowing it. There are also some denial of service attacks that are possible with entities, though presumably heuristics can be used to limit their impact. As has been observed, all the XML produced by the SOAP HTTP binding is fully XML conformant and is processable by standard processors if you like...what a standard processor may not do is detect all uses of XML that are illegal SOAP, but you'll always have lots of SOAP-specific checking to do in any case. I suspect that standard processors will in general be significantly slower than what you will find over time in highly tuned SOAP implementations. In any case, the Protocols WG made a conscious decision to enable such optimizations. The fact is that doing high performance message processing using a technology like XML (text based, variable offset) is in some ways a stretch. I see the glass as half full: by making a few sensible compromises, SOAP ensures that every conformant SOAP message is legal XML, which I think is a big step forward from the binary alternatives. I would only want to see internal subsets, etc. reintroduced if we can demonstrate that the result is in fact practical for its intended uses. I don't deny that this is an issue with (at least) two sides, but here I am concerned mainly about W3C process. I do understand that there is potentially an architecture issue here as well, but I would think that the most useful input to the TAG would come as a summary of whatever emerged in a discussion between the protocols WG and those who might question its decisions regarding the internal subset. As far as I know, that issue wasn't raised and the discussion didn't happen. It is late in the SOAP review process, but if anything at all is to be done in reopenning this issue, I think it should start with the protocols workgroup and not the TAG. Thank you! ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------ Paul Grosso <pgrosso@arbortext.com> Sent by: www-tag-request@w3.org 11/25/02 12:50 PM To: www-tag@w3.org cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: SOAP's prohibiting use of XML internal subset Categories: One of the design decisions/goals of the XML 1.0 Recommendation [1] was to have as few optional features as possible [2]. XML 1.0 allows an XML document to have a prolog that includes some declarations in what is called the internal subset [3]. An important class of XML documents are those that are "standalone" [4]. In such documents, the only way to provide entity declarations [5] or attribute defaults [6] is to put such declarations in the internal subset. It is my understanding that the Last Call draft of SOAP 1.2 [7] makes use of an XML format that does not permit any internal subset, despite the fact that XML 1.0 does not define such a profile/subset of XML. I wonder what the definition of such profiles by individual specifications will do for interoperability. For a case in point, the XML Core WG has been asked to address the issue of how to declare "character entities." Our answer (see [8]) is that the way to declare such entities in XML is to use the internal subset, an integral part of XML 1.0 that must be supported by all compliant XML processors. The fact that this solution doesn't work for SOAP has not overridden the XML Core WG's reluctance to consider development and endorsement of new XML syntax to support what is already supported in XML 1.0. However, we recognize that the current situation means that the use of entities and attribute defaults is not available to SOAP users. Is this an architectural issue that the TAG wishes to address? I am writing this message in a personal capacity, as I have not discussed this particular message with the XML Core WG (though at least parts of the issue have been discussed in the WG, and there are clearly parts of the issue that do touch on XML Core WG work). I would be interested in hearing any comments the TAG might have on this situation. paul [1] http://www.w3.org/TR/REC-xml [2] http://www.w3.org/TR/REC-xml#sec-origin-goals point 5 [3] http://www.w3.org/TR/REC-xml#dt-doctype [4] http://www.w3.org/TR/REC-xml#sec-rmd [5] http://www.w3.org/TR/REC-xml#sec-entity-decl [6] http://www.w3.org/TR/REC-xml#sec-attr-defaults [7] http://www.w3.org/TR/soap12-part0/ and others [8] http://www.w3.org/XML/Core/2002/10/charents-20021023
Received on Monday, 25 November 2002 23:01:42 UTC