- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Thu, 27 Nov 2008 12:41:33 +0000
- To: Toman_Vojtech@emc.com
- Cc: <public-xml-processing-model-comments@w3.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Toman_Vojtech writes: > In the test suite, there is a test named 'preserve-base-uri-001'. It > tests that the base URI property of the nodes is preserved after you > remove the xml:base attributes (using p:delete). > . . . > My question is: Is this actually correct? I've been pondering this for some time, and looking into the background. Some observations first, in no particular order: 1) XSLT 2.0 makes clear that adding an xml:base attribute to a result tree _changes_ the [base URI] property for the relevant element, and, other things being equal, its children. 2) XProc does not say anything about the impact on [base URI] of using p:add-attribute to add an xml:base attribute. 3) XSLT 2.0 does not say anything about the impact of deleting an xml:base attribute, but _all_ use of copy, copy-of, etc. remove the [base URI] property in any case, so unless an element node has an explicit xml:base attribute already (in which case (1) above applies), or you explicitly add one, you will lose [base URI] information. For example, running the standard copy stylesheet on a document composed from several distinct external entities will _lose_ their various base URIs. Indeed in the absence of xml:base attributes, either in the input or added, XSLT2 output has _no_ [base URI] properties at all! 4) XProc does not say anything about the impact on [base URI] of deleting an xml:base attribute using p:delete, or renaming it, etc. 5) XInclude mandates base URI fixup, i.e. the _adding_ of xml:base attributes to the result of XInclusion, so serialisation followed by parsing preserves [base URI] 6) XProc discusses namespace fixup, but says nothing explicit about base URI fixup. This means the spec. might be construed to have an interop problem, wrt processors which serialise between each step and those which don't. Consider the following pipeline: <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"> <p:identity/> <p:add-attribute match="/b" attribute-name="bu"> <p:with-option name="attribute-value" select="p:base-uri(/b)"/> </p:add-attribute> <p:add-attribute match="/b/a" attribute-name="bu"> <p:with-option name="attribute-value" select="p:base-uri(/b/a)"/> </p:add-attribute> </p:pipeline> If you give this to Calabash with the following input: <!DOCTYPE b [ <!ENTITY a SYSTEM "1.xml"> ]> <b> &a;</b> where 1.xml is <a/> you get the following output: <b bu="file:...//2.xml"> <a bu="file:...//1.xml"/> </b> but that would not be the case without base URI fixup from an implementation which serialised between steps. But I think the spec. can actually be read as _requiring_ base URI fixup. It says "Except where the semantics of a step explicitly require changes, processors are required to preserve the information in the documents and fragments they manipulate. In particular, the information corresponding to the [Infoset] properties . . . [base URI] *must* be preserved." Now this language is not clear about whether it is talking about preservation _across_ steps or _between_ steps, but I think it must be read as covering _both_. It follows, just as for namespace fixup, that "an implementation which does serialise between steps . . . must perform such fixups". I think we need to make this clearer. That still leaves the issues implied by points (2) and (4). Wrt (2) I think we need to follow XSLT2 here and make it a requirement that adding xml:base changes the [base URI] property recursively. This comes for free for serialise-everywhere processors, but not for others. Wrt (4) I'm less clear, but the temptation is to go the same way, because to do otherwise means that deleting xml:base can have no effect for a serialise-everywhere processor, because any deleted attributes will just be re-inserted at serialisation time. This is all making me very uncomfortable, I have to say, because I don't see how to avoid the following unsatisfactory net conclusion: the actual serialised output of a whole pipeline must always have xml:base attributes in it. This arises because serialise-everywhere parsers will have to add xml:base attributes to at least the document element of their document _inputs_, and these will then appear in their final output. Also, if the parallel with namespace fixup goes through, note that we don't require ns fixup for intermediate steps for non-serialise-everywhere processors, but we _do_ require it of all processors on final output. Norm, someone, save me from myself here, please! ht - -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFJLpV9kjnJixAXWBoRAk8gAJ9MROGUWLPNiqQbdCRJ3IC3AythogCeJJ9E oQVnTXqtUhWwOMFR9qx/Jik= =xn73 -----END PGP SIGNATURE-----
Received on Thursday, 27 November 2008 12:42:10 UTC