W3C home > Mailing lists > Public > www-tag@w3.org > January 2003

RE: On subsetting XML...

From: David Orchard <dorchard@bea.com>
Date: Wed, 15 Jan 2003 15:24:07 -0800
To: <www-tag@w3.org>
Message-ID: <02cf01c2bced$34292d20$a80ba8c0@beasys.com>

Norm,

I don't think your document accurately reflects the consensus of the TAG on
the id issue.  There are at least 3 people on the TAG, if not more, that are
actively interested in discussing the id issue.  I understand your position,
and I understand that you prefixed your message with "some of my thoughts",
but I was hoping you would more describe where the TAG is, that is divided
on the issue and discussing it.

Cheers,
Dave


> -----Original Message-----
> From: www-tag-request@w3.org
> [mailto:www-tag-request@w3.org]On Behalf Of
> Norman Walsh
> Sent: Wednesday, January 15, 2003 2:39 PM
> To: www-tag@w3.org
> Subject: On subsetting XML...
>
>
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Several weeks ago, I took an action item to draft up some of my
> thoughts about what an XML subset should look like. The TAG has
> discussed these ideas a couple of times and, while it's probably an
> overstatement to say that we have consensus, we did agreed that it was
> time to distribute these ideas more broadly and solicit more input.
>
> Comments, etc., most welcome, as always.
>
> Profiling XML, providing more implementation options, will necessarily
> increase the possibility of interoperability problems and it would be
> best to avoid doing so. Profiles are a bad idea on general principles
> and are in direct conflict with one of the original goals of
> XML[1]: "the
> number of optional features in XML is to be kept to the absolute
> minimum, ideally zero."
>
> Unfortunately, a number of user communities have expressed a need to
> work with only a subset of XML. The TAG is concerned that if these
> needs are not addressed quickly (and centrally), a number of slightly
> different XML subsets will arise and if this trend continues, the
> stability of XML as the basis of a whole range of technologies could
> be jeopardized.
>
> One way to avoid this problem is to produce a new recommendation that
> identifies a subset of XML for use in those environments where
> supporting all of XML is not practical.
>
> One obvious place where such a subset has been deployed is in SOAP[2].
> SOAP forbids internal and external subsets and strongly discourages
> processing instructions.
>
> When asked, the XML Protocol WG listed these[3] among their
> reasons for
> subsetting:
>
>  * Performance: processing internal subsets and buffer management for
>                 handling entity expansion would slow things down.
>  * Simplicity:  if an external subset is referenced, it has to be
>                 available when the parser runs (if it's available
>                 to some but not all processors, different results
>                 are possible).
>  * Security:    entity expansion introduces the possibility of DoS
>                 attacks; other security issues might arise
>
> Although it was explicitly not a goal of the XML Protocol WG to
> produce a subset of XML (independent of their own application needs),
> this seems like a good place to start.
>
> However, precisely how the subset is defined requires careful
> consideration as this is an exercise that should be conducted only
> once. The subset selected must be small enough so that no further
> subset will be required but also complete enough to be useful for a
> wide range of applications.
>
> One clear requirement of the subset is that it must exclude internal
> and external subsets (no <!DOCTYPE declaration is allowed). This
> requirement effectively removes DTDs from XML and consequently removes
> entities and notations.
>
> What remains are elements, attributes, namespace declarations,
> comments, processing instructions, and character data. While comments
> and processing instructions might conceivably be removed, they are
> sufficiently useful that we think they should remain. (Although the
> SOAP spec forbids senders from including processing instructions, it
> accepts that receivers might get them, so it's clear that removing
> processing instructions from the subset is not a requirement of the
> SOAP subset.)
>
> Some people have proposed that what is really needed is a "subset
> plus", that is a subset of XML with a new feature or two. The most
> often requested feature in this regard is support for xml:id. I
> feels strongly that it would be a mistake to introduce a single
> new feature, or a single change of any sort that would not be
> completely compatible with XML 1.1, in the work that subsets XML.
> (Support for xml:id or any other feature is an orthogonal issue and
> must not be conflated with the effort to define a subset, even if the
> subset makes a particular feature more necessary or desirable.)
>
> Along these lines, a number of people have suggested that the right
> approach to this problem is to define a new recommendation that
> combines the current suite of related recommendations (XML 1.1, XML
> Infoset, Namespaces in XML, and perhaps XML Base) into a single
> document. Tim Bray has demonstrated[4] one example of how this might
> appear.
>
> To the extent that this might be viewed as an editorial decision, one
> that may offer tangible benefits to XML users and implementors,
> particularly new users and implementors, but which makes no technical
> changes to the languages defined by (and definable by) XML, this seems
> not unreasonable.
>
> However, it's clear that performing this "unification" exercise on all
> of XML 1.1, without introducing any backwards incompatible changes,
> may be an extraordinarily large editorial task, out of proportion with
> the effort required simply to define the subset in some less invasive
> way.
>
> Conversely, defining only the subset in a unified document would be
> easier but would introduce issues of its own. Doing so would
> effectively split the XML specifications into two tracks: a unified
> "subset" track and a non-unified "full XML" track. This could be the
> source of considerable confusion and accidental divergence.
>
> Further consideration of these technical and editorial issues, and the
> eventual creation of a new recommendation, would seem to be within the
> scope of the XML Core WG's charter[5] which reads, in part, "[the] WG
> will also study the advisability of a version 2.0 of the XML
> specification and may undertake the preparation of such a
> specification, if deemed advisable."
>
> In short, it appears that a new recommendation-track document that
> defines a subset of XML 1.1 should be developed:
>
>   * The subset must be backwards compatible with XML 1.1.
>   * The subset must define a language that excludes DTD declarations
>
> How the new recommendation is constructed we leave to the editorial
> discretion of the group that undertakes it.
>
> [1] http://www.w3.org/TR/REC-xml#sec-origin-goals
> [2] http://www.w3.org/2000/xp/Group/2/11/08/soap12-part1.html#soapenv
> [3] http://lists.w3.org/Archives/Public/www-tag/2002Dec/0119.html
> [4] http://www.textuality.com/xml/xmlSW.html
> [5]
http://www.w3.org/2001/12/xmlbp/xml-core-wg-charter.html#deliverables

                                        Be seeing you,
                                          norm

- --
Norman.Walsh@Sun.COM    | There is a road from the eye to the heart
XML Standards Architect | that does not go through the intellect.--G.
Web Tech. and Standards | K. Chesterton
Sun Microsystems, Inc.  |
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>

iD8DBQE+JeMOOyltUcwYWjsRAmsuAJ49pGFH6nPSmZvEXQNrVZGr37plygCeOUJD
zHXfJSzY/7YujRkXD5o+yzo=
=3UHP
-----END PGP SIGNATURE-----
Received on Wednesday, 15 January 2003 18:25:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:15 GMT