W3C home > Mailing lists > Public > www-tag@w3.org > January 2003

Re: On subsetting XML...

From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Thu, 16 Jan 2003 09:54:17 -0500
Message-Id: <p04330104ba4c7649a855@[]>
To: Norman Walsh <Norman.Walsh@Sun.COM>, www-tag@w3.org

At 5:39 PM -0500 1/15/03, Norman Walsh wrote:

>Profiling XML, providing more implementation options, will necessarily
>increase the possibility of interoperability problems and it would be
>best to avoid doing so. Profiles are a bad idea on general principles
>and are in direct conflict with one of the original goals of XML[1]: "the
>number of optional features in XML is to be kept to the absolute
>minimum, ideally zero."

This is correct.

>Unfortunately, a number of user communities have expressed a need to
>work with only a subset of XML. The TAG is concerned that if these
>needs are not addressed quickly (and centrally), a number of slightly
>different XML subsets will arise and if this trend continues, the
>stability of XML as the basis of a whole range of technologies could
>be jeopardized.

These communities are wrong. They should be told not to subset XML 
syntax as an architectural principle. If they choose to do so anyway, 
then the W3C should not bless this behavior. It is an 
interoperability disaster.

Part of the benefit of XML is the separation of underlying syntax 
from application level semantics. All profiling should be at the 
semantic level (tag names, attribute names, attribute values, element 
content, etc.) None of it should be on syntactic issues.

>One obvious place where such a subset has been deployed is in SOAP[2].
>SOAP forbids internal and external subsets and strongly discourages
>processing instructions.

The worse for SOAP then

>When asked, the XML Protocol WG listed these[3] among their reasons for
>  * Performance: processing internal subsets and buffer management for
>                 handling entity expansion would slow things down.

If XML is too slow, don't use XML.

>  * Simplicity:  if an external subset is referenced, it has to be
>                 available when the parser runs (if it's available
>                 to some but not all processors, different results
>                 are possible).

This is a non-issue. XML 1.0 explicitly allows parsers not to read 
the external DTD subset. Personally, I think this is probably a 
design flaw in XML 1.0, but it's not one we can or should fix now.

>  * Security:    entity expansion introduces the possibility of DoS
>                 attacks; other security issues might arise

I suspect these can be handled without breaking XML 1.0. Set a 
reasonable maximum size for parsed documents and throw an error when 
that size is exceeded.  There are probably other solutions to this 
theoretical problem.

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|           Processing XML with Java (Addison-Wesley, 2002)          |
|              http://www.cafeconleche.org/books/xmljava             |
| http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA  |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
Received on Thursday, 16 January 2003 10:12:55 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:32:36 UTC