Implications of using XPointer for XInclude

[Apologies for the cross-post.  This discussion started on xml-dev, but
has clear relevance to www-xml-linking-comments for XPointer and
www-xml-xinclude-comments for XInclude.]

The use of XPointer [1] by XML Inclusions (XInclude) [2] has some
processing implications which substantially increase the cost
(development, CPU cycles, memory) of a conformant implementation of
XInclude.

Section 4.2 of the XInclude spec [3] states that:
>When parsing as XML, the fragment part of the URI reference is
>interpreted as an XPointer [XPointer], regardless of the media type of
>the resource. The XPointer indicates a subresource as the target for
>inclusion.

Section 4.2 then goes on at length regarding the results, legal and
illegal, of various kinds of XPointer processing and how they should or
should not be included in the document.  Multiple node responses and
ranges are explicitly legal.

While XInclude seems quite capable in a processing environment where
full XPointer support is provided, the nature of that environment and
some of the situations that environment will have to handle are worth
questioning.

Because "XPointer is built on top of the XML Path Language[4],"
XPointer includes all of XPath and then some.  Unlike the use of XPath
in W3C XML Schema [5], there is no restriction on the XPath expressions
or axes supported.

As a result, XPointers (and hence XInclude expressions) can include
XPaths which reference for instance, the preceding, or preceding-sibling
axes.  The use of these axes requires tree-building processing, as they
cannot be reliably processed in a stream-oriented environment.

Stream-processing has one substantial advantage over tree-building: a
considerably smaller memory footprint.  The 'classic' example of Jon
Bosak's Old Testament XML file [6], which is 3.3 MB of structured text,
no longer containing the chapter and verse information explicitly, is
both a large document and one from which people may reasonably choose to
cite passages.  As documents tend to grow when stored in object trees,
having to process this document _as a tree_ in order to extract
fragments from it could be a very substantial burden, even if the
document is stored locally.

Processing environments which implement XInclude fully, even if they are
themselves capable of working in a stream-based environment [7], are
going to have to deal with this potential for tree-building.  There are
a few possible strategies:

1) Implement XInclude on top of complete (or "nearly complete") XPointer
support and accept the tree-building expense. (appears to be the current
approach of libxml [8].)

2) Implement a subset of XPointer on something like the subset defined
by W3C XML Schema Structures [5], supporting only the child and/or
attribute axes and possibly though not necessarily the string-based
capabilities.  Using that subset, apply stream-processing to XIncluded
documents and include the portions needed without building trees.

3) Use a mixture of strategies 1 and 2, analyzing all XPointers to
determine which axes are used and only building the tree (or even a
subset of the tree) if necessary.  Reduces memory impact at the cost of
program complexity and redundancy.

4) Subset the specification so as to ignore fragment identifiers
(appears to be the current approach of [7]).

While a 3K wrapper including a verse from Leviticus in the Old Testament
(via XInclude) may seem like something of an edge case, I have a
difficult time describing it as unreasonable or unlikely.  

These problems are not shared by FIXptr[9], which is effectively a
conservative version of strategy 2.  A similar approach could be built
on the XPath subset defined in W3C XML Schema Structures [5].  

Similar issues regarding XInclude's use of XPointer appear to have been
rejected by the XLink WG [10] ("the WG was unwilling to give up XPointer
support"), but I would hope that processing considerations might reopen
that discussion.

[1] - http://www.w3.org/TR/xptr
[2] - http://www.w3.org/TR/xinclude
[3] - http://www.w3.org/TR/xinclude/#xml-included-items
[4] - http://www.w3.org/TR/xpath
[5] - http://www.w3.org/TR/xmlschema-1/#coss-identity-constraint
[6] - archived in http://metalab.unc.edu/bosak/xml/eg/rel200.zip

[7] - http://www.ibiblio.org/xml/XInclude/ 
[8] - http://xmlsoft.org/
[9] -
http://lists.w3.org/Archives/Public/www-xml-linking-comments/2001AprJun/att-0074/01-NOTE-FIXptr-20010425.htm
[10] -
http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2001Aug/0004.html

Received on Wednesday, 22 August 2001 08:49:07 UTC