Re: Appling inheritance rule to xml:base, was Re: FINAL minutes for the XML from John Boyer on 2006-03-06 (public-xml-core-wg@w3.org from March 2006)

From: John Boyer <boyerj@ca.ibm.com>
Date: Mon, 6 Mar 2006 07:58:18 -0800
To: daniel@veillard.com
Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, public-xml-core-wg@w3.org, public-xml-core-wg-request@w3.org
Message-ID: <OF46EEFA53.8AB8F323-ON88257129.0055B3A8-88257129.0057BCE7@ca.ibm.com>
Hi Daniel and Henry,

What I meant to clear up in the prior email was that the inheritance rule 
is a good thing that
*never* produces the wrong result when it is actually invoked.

So, hopefully this will assuage your concerns about retaining the rule for 
xml:base.

The problem is that the core team observed that there are cases involving 
the use of
relative URIs in xml:base in which the omission of an intervening xml:base 
causes the
expressed xml:base attributes in the portion of the document being 
retained to have
altered meaning.

The fact is that you simply cannot save a document author from himself 
when it comes
to omission.

I personally believe that although it is *possible* to use an xpath filter 
to orphan an 
element, it should just never be done in practice because too much 
semantics are 
typically associated  with the ancestors of an element.  The loss of 
fragments of 
relative URI paths in xml:base is but one example of this.

>From the example of the prior email, it is easy to see that c as a child 
of a may mean 
something completely different than c as a child of b.  There is nothing 
that can be done 
to protect authors from this kind of information loss if they don't 
understand this aspect of 
their schema.

But again, it's not a security problem that arises *because* of the 
inheritance rule. 
It is an orthogonal security problem, and an extreme edge case, that 
authors could 
experience if they *express* an xml:base (non-inherited) on a node
*and* it is orphaned by a filter *and* the xml:base contains a relative 
URI.

While the inheritance rule has nothing to do with addressing this problem 
(whether it should
be addressed notwithstanding), the inheritance rule does remove a certain 
number of other
security issues, so there is certainly no harm in retaining it.

As to whether it should be addressed or not, the issue remains that had 
base not been
added to the xml namespace, we wouldn't even be having this discussion. In 
other words,
there are lots of relative URIs used in XML vocabularies (e.g. the src 
attribute), and even if 
we were to attempt a fix for xml:base, it does not protect the document 
author from loss of
ancestor information

Best regards,
John M. Boyer, Ph.D.
Senior Product Architect/Research Scientist
Co-Chair, W3C XForms Working Group
Workplace, Portal and Collaboration Software
IBM Victoria Software Lab
E-Mail: boyerj@ca.ibm.com  http://www.ibm.com/software/

Blog: http://www.ibm.com/developerworks/blogs/boyer





Daniel Veillard <daniel@veillard.com> 
Sent by: public-xml-core-wg-request@w3.org
03/06/2006 01:36 AM
Please respond to
daniel


To
"Henry S. Thompson" <ht@inf.ed.ac.uk>
cc
John Boyer/CanWest/IBM@IBMCA, public-xml-core-wg@w3.org
Subject
Re: Appling inheritance rule to xml:base, was Re: FINAL  minutes for the 
XML







On Mon, Mar 06, 2006 at 02:39:48AM +0000, Henry S. Thompson wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I wasn't at the f2f, for which apologies, but I find myself made
> uneasy by the proposal to retain 'inheritance' of xml:base.  As you

  Same here. It means silent breakage of the document at canonicalization
time, this must be avoided.

> say, this doesn't always give the 'right' results.  What I find
> frustrating is that it's easy to state a strategy which _would_ always
> give the 'right' answer, namely:
> 
>  "Use the name *EII* for an element information item to be
>   canonicalized, and *EIIC* for the element information item
>   corresponding to *EII* in the result of parsing the canonical
>   serialization of the node-set containing *EII*.
> 
>  "Synthesize an xml:base attribute for *EII* iff the *EIIC*'s [base
>   URI] would otherwise be different from *EII*'s [base URI]."
> 
> This has the advantage that not only does it correctly produce
> 
> <a xml:base="http://example.org">
>        <c xml:base="test"/>
> </a>
> 
> from 
> 
> <a xml:base="http://example.org">
>    <b xml:base="test">
>        <c/>
>    </b>
> </a>
> 
> when <b>...</b> is filtered out, but it will _also_ correctly produce
> 
> <a xml:base="http://example.org">
>        <c xml:base="http://example.org/test/test"/>
> </a>
> 
> 
> from
> 
> <a xml:base="http://example.org">
>    <b xml:base="test">
>        <c xml:base="test"/>
>    </b>
> </a>
> 
> when <b>...</b> is filtered out.

  Note: I'm not sure the examples really convey what they should,
        if in the example we used <b xml:base="test/a"> then the
                 composition would lead to  a test/test base on c, but as
                 written I do think the composition is still 'test' in the 
result
                 I assume John didn't tried to apply the computation from
                 RFC2396 * manually or maybe the example we had at the f2f 
were
                 misleading or broken. If all xml:base reference resources 
in the
                 same "directory" basically the composition problem 
doesn't 
                 appear in practice.
[*] or later

  Hum, assume you don't have a fixed base on a, then you force generating
a base depending on the document base,, which mean suddenly 
canonicalization
of a document depends on how you retrieved it (e.g. a file access would 
end up with file:///localpath/test/test while from a web access you would
get http://example.org/test/test , I don't think it's acceptable either.

> Can't we come up with a way to get this effect?

  I definitely prefer a solution leading to false negative i.e. we fail to
canonicalize in the same way, than a situation leading to false positive
where the canonicalization result in a broken result.
  We already discussed in the past especially with Richard generating
relative xml:base when possible, maybe we need to formalize this and 
put it as the algorithm to compute the canonicalized result.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ |
Received on Monday, 6 March 2006 15:58:47 UTC