W3C home > Mailing lists > Public > www-tag@w3.org > November 2002

Re: SOAP's prohibiting use of XML internal subset

From: Chris Lilley <chris@w3.org>
Date: Fri, 29 Nov 2002 12:57:53 +0100
Message-ID: <1626797296.20021129125753@w3.org>
To: www-tag@w3.org, "Mark Nottingham" <mnot@mnot.net>
CC: "Tim Bray" <tbray@textuality.com>, "Paul Grosso" <pgrosso@arbortext.com>

On Monday, November 25, 2002, 11:15:06 PM, Mark wrote:

Tim Bray (I believe) wrote:
>> - That granted, forbidding an internal subset seems kind of dumb.
>> Speaking as an XML processor implementor, the extra code required is
>> hardly detectable and the performance gain not significiant.
>> Furthermore, every XML processor in the world just silently does the
>> internal subset and it's going to cost *extra work* for SOAP
>> implementations to check that they haven't.  I.e. you can't use an
>> ordinary off-the-shelf non-validating XML processor.

MN> Perhaps the WG has a good reason for this prohibition; have they been
MN> asked?

I discussed this with Yves Lafon the other day. The argument that a
message in a protocol has to be self-standing is fairly compelling -
it can't just buffer up while some other resource is fetched (perhaps
using a different protocol).

I have seen similar arguments for SVG Tiny - although it is deployed
in a bandwidth-challenged environment, bandwidth is not the major issue
and an SVG Tiny SVG file may well be larger than an equivalent SVG
Full if it includes raster images (using the data: protocol and
base64 encoding) because an MMS message that ses SVG needs to be
stand-alone and convey all the resources it will need for display. The
latency of the protocol means that fetching a secondary resource would
give noticeable lag times (several seconds, rather than several tens
of milliseconds) and, in the case of phone to phone messaging (in
other words, P2P) there *is* no server that can be asked for any
secondary resources - its a one-shot push not a pull.

The other argument that I have heard is the twin security holes of

 a) an external parsed entity that deliberately is large, or on a
 server that deliberately operates at a few bytes a minute

 b)  a power-series entity expansion DoS attack:

<!ENTITY x "abcdefg">
<!ENTITY x2 "&x;&x;">
<!ENTITY x3 "&x2;&x2;">
<!ENTITY x4 "&x3;&x3;">
<!ENTITY x5 "&x4;&x4;">
<!ENTITY x6 "&x5;&x5;">
<!ENTITY x7 "&x6;&x6;">
<!ENTITY x8 "&x7;&x7;">
<!ENTITY x9 "&x8;&x8;">
<!ENTITY xa "&x9;&x9;">
<!ENTITY xb "&xa;&xa;">
<!ENTITY xc "&xb;&xb;">
<!ENTITY xd "&xc;&xc;">
<!ENTITY xe "&xd;&xd;">
<!ENTITY xf "&xe;&xe;">

x is 8 bytes long (in UTF-8; 16 bytes as a DOMString) xf is 128kb long
(256k as a DOM string).

It would be fairly trivial to have an entity that was some terrabytes
in size using this method.

But then I ask myself - does SOAP prohibit messages that are over
large due to the simpler and less devious method of just having a very
large message?



-- 
 Chris                            mailto:chris@w3.org
Received on Friday, 29 November 2002 06:57:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:13 GMT