W3C home > Mailing lists > Public > www-tag@w3.org > November 2002

Re: SOAP's prohibiting use of XML internal subset

From: Tim Bray <tbray@textuality.com>
Date: Mon, 25 Nov 2002 21:07:09 -0800
Message-ID: <3DE3017D.5060501@textuality.com>
To: noah_mendelsohn@us.ibm.com
Cc: Paul Grosso <pgrosso@arbortext.com>, www-tag@w3.org, fallside@us.ibm.com

noah_mendelsohn@us.ibm.com wrote:

> The fact is that there are advantages as well as disadvantages to SOAP's
> decision to disallow the internal subset, and as one who has built SOAP
> implementations I can tell you that the performance implications of
> dealing with the internal subset would be significant for the sorts of
> applications and performance regimes that my employer (IBM) anticipates.
> General purpose XML processors are only sometimes the right design point
> for consumers of XML.  Try handling hundreds or thousands of messages per
> second while doing all the dynamic buffer management implied by parsing
> internal subsets and doing entity substitution and you will find that
> there is a real cost to allowing it.  There are also some denial of
> service attacks that are possible with entities, though presumably
> heuristics can be used to limit their impact. 

I can certainly see this point; I'll check the SOAP specs - if what they 
say is "thou shalt not produce an internal subset", that seems more 
defensible than saying "consumers must throw mesages on the floor if 
there's an internal subset".

Also it's been pointed out that the use of the internal subset upens up 
the potential of the old "billion laughs" denial-of-service attack. 
Ouch.  If I were feeling self-aggrandizing I'd point once again at 
XML-SW, which if it had any official standing is exactly what SOAP 
(among other people) need.

With respect to Paul's original point, I don't think the i18n 
char-entity issue is really material here; the content of a SOAP 
envelope is being generated by a machine, which can blast the binary 
code-points straight into the UTF-whatever stream or do numeric 
character references. -Tim
Received on Tuesday, 26 November 2002 00:07:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:13 GMT