- From: Simon St.Laurent <simonstl@simonstl.com>
- Date: 22 Aug 2001 08:50:44 -0400
- To: xml-dev@lists.xml.org
- Cc: www-xml-linking-comments@w3.org, www-xml-xinclude-comments@w3.org
[Apologies for the cross-post. This discussion started on xml-dev, but has clear relevance to www-xml-linking-comments for XPointer and www-xml-xinclude-comments for XInclude.] The use of XPointer [1] by XML Inclusions (XInclude) [2] has some processing implications which substantially increase the cost (development, CPU cycles, memory) of a conformant implementation of XInclude. Section 4.2 of the XInclude spec [3] states that: >When parsing as XML, the fragment part of the URI reference is >interpreted as an XPointer [XPointer], regardless of the media type of >the resource. The XPointer indicates a subresource as the target for >inclusion. Section 4.2 then goes on at length regarding the results, legal and illegal, of various kinds of XPointer processing and how they should or should not be included in the document. Multiple node responses and ranges are explicitly legal. While XInclude seems quite capable in a processing environment where full XPointer support is provided, the nature of that environment and some of the situations that environment will have to handle are worth questioning. Because "XPointer is built on top of the XML Path Language[4]," XPointer includes all of XPath and then some. Unlike the use of XPath in W3C XML Schema [5], there is no restriction on the XPath expressions or axes supported. As a result, XPointers (and hence XInclude expressions) can include XPaths which reference for instance, the preceding, or preceding-sibling axes. The use of these axes requires tree-building processing, as they cannot be reliably processed in a stream-oriented environment. Stream-processing has one substantial advantage over tree-building: a considerably smaller memory footprint. The 'classic' example of Jon Bosak's Old Testament XML file [6], which is 3.3 MB of structured text, no longer containing the chapter and verse information explicitly, is both a large document and one from which people may reasonably choose to cite passages. As documents tend to grow when stored in object trees, having to process this document _as a tree_ in order to extract fragments from it could be a very substantial burden, even if the document is stored locally. Processing environments which implement XInclude fully, even if they are themselves capable of working in a stream-based environment [7], are going to have to deal with this potential for tree-building. There are a few possible strategies: 1) Implement XInclude on top of complete (or "nearly complete") XPointer support and accept the tree-building expense. (appears to be the current approach of libxml [8].) 2) Implement a subset of XPointer on something like the subset defined by W3C XML Schema Structures [5], supporting only the child and/or attribute axes and possibly though not necessarily the string-based capabilities. Using that subset, apply stream-processing to XIncluded documents and include the portions needed without building trees. 3) Use a mixture of strategies 1 and 2, analyzing all XPointers to determine which axes are used and only building the tree (or even a subset of the tree) if necessary. Reduces memory impact at the cost of program complexity and redundancy. 4) Subset the specification so as to ignore fragment identifiers (appears to be the current approach of [7]). While a 3K wrapper including a verse from Leviticus in the Old Testament (via XInclude) may seem like something of an edge case, I have a difficult time describing it as unreasonable or unlikely. These problems are not shared by FIXptr[9], which is effectively a conservative version of strategy 2. A similar approach could be built on the XPath subset defined in W3C XML Schema Structures [5]. Similar issues regarding XInclude's use of XPointer appear to have been rejected by the XLink WG [10] ("the WG was unwilling to give up XPointer support"), but I would hope that processing considerations might reopen that discussion. [1] - http://www.w3.org/TR/xptr [2] - http://www.w3.org/TR/xinclude [3] - http://www.w3.org/TR/xinclude/#xml-included-items [4] - http://www.w3.org/TR/xpath [5] - http://www.w3.org/TR/xmlschema-1/#coss-identity-constraint [6] - archived in http://metalab.unc.edu/bosak/xml/eg/rel200.zip [7] - http://www.ibiblio.org/xml/XInclude/ [8] - http://xmlsoft.org/ [9] - http://lists.w3.org/Archives/Public/www-xml-linking-comments/2001AprJun/att-0074/01-NOTE-FIXptr-20010425.htm [10] - http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2001Aug/0004.html
Received on Wednesday, 22 August 2001 08:49:07 UTC