Re: XML-* [was: ... XML subsetting...] from Jeremy Dunck on 2002-12-05 (www-tag@w3.org from December 2002)

From: Jeremy Dunck <ralinon@hotmail.com>
Date: Thu, 05 Dec 2002 17:13:12 -0600
To: tbray@textuality.com, pgrosso@arbortext.com
Cc: www-tag@w3.org
Message-ID: <BAY1-F156BBsQwY6YHk00009c23@hotmail.com>
Tim Bray wrote:
>Paul Grosso wrote:
>>
>>I am not sure what I think (though I'm generally skeptical
>>about the need to do anything here).
<snip>
>>1.  Arguments about making XML easier to implement are
>>not effective.
>This seems correct, and would not in itself be a reason to do this. 
>Although being able to reduce the size/complexity of XML processors is 
>unambiguously a good thing.

Agreed.

>>2.  As far as user requirements, it is hard to see how removing
>>capabilities from XML can benefit users.  Users use what they
>>use, and they might be negatively impacted if you remove some
>>feature, but how can they be positively impacted by removal
>>of features?
>Users of SOAP would benefit because as XML 1.0 is specified they are open 
>to a severe and hard-to-resist denial-of-service attack via the old 
>billion-laughs scam.  In fact this benefit extends to anyone who wants to 
>provide a high-performance wire protocol using XML.

I could be being naive here, but is there not a way to avoid the 
billion-laughs without cutting out this (IMHO) very useful feature?

You're already talking about breaking compatibility, so hopefully I'm not 
being too radical in suggesting that heuristics, or even strict rules could 
stop the DoS possibility.

1)  How about a rule that no entity can exceed n bytes?
  Pro) Easy to understand, little overhead.
  Con) Arbitrary.
2)  What about no entities with depth (that is, that reference other 
entities repeatedly) greater than n?
  Pro) More flexible than size limitation.
  Con) Less obvious why failures occur.  Overhead of counting depths.
3)  Do away with the ability for an entity to reference another entity,
  Pro) Effectively avoids the possibility altogether.
  Con) Hugely lessens the usefulness as an entity as a device for succinctly 
defining common constructs.
4)  No entities that grow by more than n % when other entities are 
dereferenced?
  Pro) Avoids arbitrary depth limitations.
  Con) Less obvious why failures occur.  Overhead of calculating 
percentages.

Lastly, am I correct in my understanding that the DoS through entity 
expansion is only possible when external subsets are used, and when that 
referenced subset is compromised?  That is, how can the DoS happen if only 
trusted resources are used as external subsets?

If my understanding is correct, then aren't we -really- dealing with a 
conventional security issue, and does it make sense to remove a generally 
useful feature to avoid the more widely useful (if complicated and far-off) 
implementation of better trust methods?

>>3.  As far as adding features, the (only somewhat facetious)
>>     argument goes like this:
>>   a.  Any attempt to develop an XML 2.0 will realistically
>>       take at least two years from start to Rec, and that
<snip>
>>   b.  Any feature that can wait two+ years can't be needed so       badly 
>>as to be crucial enough to consider it as a possible
>>       addition to XML 2.0.
>I think those are excellent arguments and not facetious at all.  So my 
>personal proposal (*not* speaking for the TAG) is that this project be 
>strictly forbidden from adding any new features whatsoever - one is too 
>many - beyond those that are in XML 1.1, namespaces 1.1, the infoset, and 
>(maybe) xml:base, and strongly encouraged to remove some, in particular 
>DTDs.

I'm obviously not as knowledgable about XML features and evolution as others 
on this list, however, it seems like one feature that would be very useful 
in XML 2.0 (and in other Recs from the W3C) is a standard way to determine 
optional features required for successful use/execution/processing of the 
document/resource/program/data.

Even if a cross-language standard for such a purpose could not be reasonably 
done, it would still be greatly beneficial for within a particular language 
evolution.  This is along the lines of [Features].

We see time and again that optional features and processing leads to 
ambiguity, overhead and interop issues.  Then again, sometimes optional 
features (such as no external entities being loaded) have real value.  To 
solve this problem, it seems to me that it would be useful to have language 
capability to describe the settings for these options at the time of 
processing, so that it becomes unambiguous.

In instances where performance is more important than language features, 
this could be done at the root-element.  This parallels the suggestion in 
[Common XML].

In instances where performance is less important that language features, 
this could be done similarly (actually, the same as) namespacing is done.  
That is, as namespaces are referenced , settings for optional features used 
within that namespace could be defined.

Then again, I could be way out in left field.  :)

<snip>
>s/major/any/, and no kidding. -Tim

Huh?

  Hopefully, I'm being useful...

  -Jeremy Dunck

[Features]
http://www.ietf.org/rfc/2533
updated by
http://www.ietf.org/rfc/2738

[Common XML]
http://simonstl.com/articles/cxmlspec.txt

_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail
Received on Thursday, 5 December 2002 18:16:01 UTC