Trying to assess the depth of xml:id and c14n incompatibilities from Daniel Veillard on 2005-02-12 (public-xml-id@w3.org from February 2005)

From: Daniel Veillard <veillard@redhat.com>
Date: Sat, 12 Feb 2005 11:02:30 -0500
To: www-tag@w3.org
Cc: public-xml-core-wg@w3.org, public-xml-id@w3.org
Message-ID: <20050212160230.GA1718@redhat.com>
[P.S. posting from veillard@redhat.com which may differ from the 
 address I'm subscribed to for public-xml-core-wg or public-xml-id
 sorry for the potential bounces, Daniel ]

  In light of the large amount of mail started on this and the kind
of philosophical debate "what's the meaning of a namespace" that resulted
I would like to check that I correctly understood the scope of the
problem, and what is really affected in term of functionalities. I'm
doubly motivated to get this clear, as listed author for the xml:id
draft (i.e. you can consider me tainted) and as maintainer of libxml2
which ships with an implementation of both xml:id and c14n for nearly
a year.

  So to check I understand correctly, the "Canonical XML" spec requires
that if one want to canonicalize a set of elements (which are siblings),
and if those elements have a common ancestor carrying an xml:id and that 
no closer ancestor (or self) carry an xml:id then the xml:id attribute
and its value must be copied on those element. For example

   <root xml:id="root">
         <data>
	       <child1/>
	       <child2>
	           <sub/>
	       </child2>
	       <child3/>
	 </data>
   </root>

 then to make a canonical serialization of the set of children of
data, they need to be serialized as

	       <child1 xml:id="root"/>
	       <child2 xml:id="root">
	           <sub/>
	       </child2>
	       <child3 xml:id="root"/>

(ignoring the text nodes before child1 and after child3 for this example).

  Now where is the problem exactly ?
  From an XML-1.0 + Namespace point of view the serialized fragment obtained
that way is still perfectly okay (i.e. a well balanced chunk), the only
problem which may arise are:
   1/ layers implementing xml:id will raise an error, however this is
      not a fatal error (see http://www.w3.org/TR/xml-id/#errors)
      xml:id processors are just instructed to report the duplicate ID
      error to the application using it
   2/ XPath pointers to that fragment can be disrupted

  I think 1/ proves that the current option of making xml:id errors 
non fatal is the correct handling. For example if an application with
xml:id support uses an existing digital signatures library, upon checking
the output it is possible to detect the error and possibly to correct it
as soon as possible.
  With respect to 2/ there is multiple cases:
    - I think the worse case is when the extra xml:id generated would
      override an exising ID, for example if child2 already hold before the
      transformation an ID attribute of value "root", it is however clear
      that in that case the source document was in error w.r.t.  xml:id.
      i.e. false positive when looking for an ID in the fragment output
      can only be the result of an IDness problem in the initial document.
    - another case is when the output of the canonicalization process is
      included as a fragment in another document (which I expect since
      well balanced chunk like generated above are not well formed as is
      due to the presence of multiple roots) then at inclusion time IDness
      consistancy should be checked, xml:id errors like that are just a
      special case of ID errors which may result from such an inclusion
      duplicate ID detection in the fragment are just a special case of the
      needed detection for the full document.
    - the last problem I see is that the inherited xml:id on the serialized
      fragment, simply generate extra ID, note that this does not break 
      existing pointers inside the fragment or inside a regenerated
      document built around the frament. By the definition of XPointer
      for bare name, this is a fallback to the XPath id() function which
      as explained in the XPath spec will point to the first element in
      document order hlding an attribute of type ID with that value. The
      fact that there is possibly multiple xml:id generated is no more
      of a problem than having a single one, what the xml:id actually provide
      is that the application will be alerted by the mismatch.

  So while I think the incompatibility between Canonical XML and xml:id
need to be investigated, in practice the effect of this incompatibility 
doesn't look like to be worth blocking xml:id from going forward or 
mandating an update to the Canonicalization spec (though people should
clearly prefer Exclusive XML Canonicalization which doesn't exhibit the
problem). Seems to me the drawback should be clerly documented in xml:id
and xml:id implementors should make sure that they report duplicate ID errors
especially in a context where canonicalization or digital signatures might
be used.
  
  Of course I may have missed more critical side-effects that those
extra xml:id in canonicalized output may generate, and if this is the
case reporting them will be a good idea :-)

   yours,

Daniel
-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Received on Saturday, 12 February 2005 17:04:51 UTC