Re: SOAP's prohibiting use of XML internal subset from Chris Lilley on 2002-11-29 (www-tag@w3.org from November 2002)

From: Chris Lilley <chris@w3.org>
Date: Fri, 29 Nov 2002 12:57:53 +0100
To: www-tag@w3.org, "Mark Nottingham" <mnot@mnot.net>
CC: "Tim Bray" <tbray@textuality.com>, "Paul Grosso" <pgrosso@arbortext.com>
Message-ID: <1626797296.20021129125753@w3.org>

On Monday, November 25, 2002, 11:15:06 PM, Mark wrote:

Tim Bray (I believe) wrote:
>> - That granted, forbidding an internal subset seems kind of dumb.
>> Speaking as an XML processor implementor, the extra code required is
>> hardly detectable and the performance gain not significiant.
>> Furthermore, every XML processor in the world just silently does the
>> internal subset and it's going to cost *extra work* for SOAP
>> implementations to check that they haven't.  I.e. you can't use an
>> ordinary off-the-shelf non-validating XML processor.

MN> Perhaps the WG has a good reason for this prohibition; have they been
MN> asked?

I discussed this with Yves Lafon the other day. The argument that a
message in a protocol has to be self-standing is fairly compelling -
it can't just buffer up while some other resource is fetched (perhaps
using a different protocol).

I have seen similar arguments for SVG Tiny - although it is deployed
in a bandwidth-challenged environment, bandwidth is not the major issue
and an SVG Tiny SVG file may well be larger than an equivalent SVG
Full if it includes raster images (using the data: protocol and
base64 encoding) because an MMS message that ses SVG needs to be
stand-alone and convey all the resources it will need for display. The
latency of the protocol means that fetching a secondary resource would
give noticeable lag times (several seconds, rather than several tens
of milliseconds) and, in the case of phone to phone messaging (in
other words, P2P) there *is* no server that can be asked for any
secondary resources - its a one-shot push not a pull.

The other argument that I have heard is the twin security holes of

 a) an external parsed entity that deliberately is large, or on a
 server that deliberately operates at a few bytes a minute

 b)  a power-series entity expansion DoS attack:

<!ENTITY x "abcdefg">
<!ENTITY x2 "&x;&x;">
<!ENTITY x3 "&x2;&x2;">
<!ENTITY x4 "&x3;&x3;">
<!ENTITY x5 "&x4;&x4;">
<!ENTITY x6 "&x5;&x5;">
<!ENTITY x7 "&x6;&x6;">
<!ENTITY x8 "&x7;&x7;">
<!ENTITY x9 "&x8;&x8;">
<!ENTITY xa "&x9;&x9;">
<!ENTITY xb "&xa;&xa;">
<!ENTITY xc "&xb;&xb;">
<!ENTITY xd "&xc;&xc;">
<!ENTITY xe "&xd;&xd;">
<!ENTITY xf "&xe;&xe;">

x is 8 bytes long (in UTF-8; 16 bytes as a DOMString) xf is 128kb long
(256k as a DOM string).

It would be fairly trivial to have an entity that was some terrabytes
in size using this method.

But then I ask myself - does SOAP prohibit messages that are over
large due to the simpler and less devious method of just having a very
large message?

-- 
 Chris                            mailto:chris@w3.org

Received on Friday, 29 November 2002 06:57:58 UTC