W3C home > Mailing lists > Public > public-xml-processing-model-comments@w3.org > November 2008

Re: Preserving base URI

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Thu, 27 Nov 2008 12:41:33 +0000
To: Toman_Vojtech@emc.com
Cc: <public-xml-processing-model-comments@w3.org>
Message-ID: <f5bk5apuyrm.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Toman_Vojtech writes:

> In the test suite, there is a test named 'preserve-base-uri-001'. It
> tests that the base URI property of the nodes is preserved after you
> remove the xml:base attributes (using p:delete).
> . . .
> My question is: Is this actually correct?

I've been pondering this for some time, and looking into the
background.  Some observations first, in no particular order:

1) XSLT 2.0 makes clear that adding an xml:base attribute to a result
   tree _changes_ the [base URI] property for the relevant element,
   and, other things being equal, its children.

2) XProc does not say anything about the impact on [base URI] of using
   p:add-attribute to add an xml:base attribute.

3) XSLT 2.0 does not say anything about the impact of deleting an
   xml:base attribute, but _all_ use of copy, copy-of, etc. remove the
   [base URI] property in any case, so unless an element node has an
   explicit xml:base attribute already (in which case (1) above
   applies), or you explicitly add one, you will lose [base URI]
   information.  For example, running the standard copy stylesheet on
   a document composed from several distinct external entities will
   _lose_ their various base URIs.  Indeed in the absence of xml:base
   attributes, either in the input or added, XSLT2 output has _no_
   [base URI] properties at all!

4) XProc does not say anything about the impact on [base URI] of
   deleting an xml:base attribute using p:delete, or renaming it, etc.

5) XInclude mandates base URI fixup, i.e. the _adding_ of xml:base
   attributes to the result of XInclusion, so serialisation followed
   by parsing preserves [base URI]

6) XProc discusses namespace fixup, but says nothing explicit about
   base URI fixup.

This means the spec. might be construed to have an interop problem,
wrt processors which serialise between each step and those which
don't.  Consider the following pipeline:

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc">
 <p:identity/>
 <p:add-attribute match="/b" attribute-name="bu">
  <p:with-option name="attribute-value" select="p:base-uri(/b)"/>
 </p:add-attribute>
 <p:add-attribute match="/b/a" attribute-name="bu">
  <p:with-option name="attribute-value" select="p:base-uri(/b/a)"/>
 </p:add-attribute>
</p:pipeline>

If you give this to Calabash with the following input:

<!DOCTYPE b [
<!ENTITY a SYSTEM "1.xml">
]>
<b>
&a;</b>

where 1.xml is

<a/>

you get the following output:

<b bu="file:...//2.xml">
<a bu="file:...//1.xml"/>
</b>

but that would not be the case without base URI fixup from an
implementation which serialised between steps.

But I think the spec. can actually be read as _requiring_ base URI
fixup.  It says

  "Except where the semantics of a step explicitly require changes,
   processors are required to preserve the information in the
   documents and fragments they manipulate. In particular, the
   information corresponding to the [Infoset] properties . . .  [base
   URI] *must* be preserved."

Now this language is not clear about whether it is talking about
preservation _across_ steps or _between_ steps, but I think it must be
read as covering _both_.  It follows, just as for namespace fixup,
that "an implementation which does serialise between steps . . . must
perform such fixups".  I think we need to make this clearer.

That still leaves the issues implied by points (2) and (4).  Wrt (2) I
think we need to follow XSLT2 here and make it a requirement that
adding xml:base changes the [base URI] property recursively.  This
comes for free for serialise-everywhere processors, but not for
others.  Wrt (4) I'm less clear, but the temptation is to go the same
way, because to do otherwise means that deleting xml:base can have no
effect for a serialise-everywhere processor, because any deleted
attributes will just be re-inserted at serialisation time.

This is all making me very uncomfortable, I have to say, because I
don't see how to avoid the following unsatisfactory net conclusion:
the actual serialised output of a whole pipeline must always have
xml:base attributes in it.  This arises because serialise-everywhere
parsers will have to add xml:base attributes to at least the document
element of their document _inputs_, and these will then appear in
their final output.  Also, if the parallel with namespace fixup goes
through, note that we don't require ns fixup for intermediate steps
for non-serialise-everywhere processors, but we _do_ require it of all
processors on final output.

Norm, someone, save me from myself here, please!

ht
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFJLpV9kjnJixAXWBoRAk8gAJ9MROGUWLPNiqQbdCRJ3IC3AythogCeJJ9E
oQVnTXqtUhWwOMFR9qx/Jik=
=xn73
-----END PGP SIGNATURE-----
Received on Thursday, 27 November 2008 12:42:10 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:26 UTC