W3C home > Mailing lists > Public > xmlschema-dev@w3.org > September 2005

Re: key-keyref constraints: conflicts in node-table

From: Kasimier Buchcik <K.Buchcik@4commerce.de>
Date: Mon, 26 Sep 2005 20:47:57 +0200
To: Michael Kay <mike@saxonica.com>
Cc: XML-SCHEMA <xmlschema-dev@w3.org>
Message-Id: <1127760477.1272.141.camel@librax>


On Mon, 2005-09-26 at 14:58 +0100, Michael Kay wrote:
> I'm trying (once again) to understand the complexities of the "node-table"
> concept in defining the rules for key-keyref. (Schema Part 1, 3.11.5). In
> particular this provision:
> provided no two entries have the same .key-sequence. but distinct nodes.
> Potential conflicts are resolved by not including any conflicting entries
> which would have owed their inclusion to clause 1 above. Note that if all
> the conflicting entries arose under clause 1 above, this means no entry at
> all will appear for the offending .key-sequence..
> At present Saxon implements the rather simpler rule: (informally) it's an
> error if a keyref matches more than one node with the same key. The actual
> rule seems to be that it's an error if a keyref matches more than one key,
> unless one of these keys appears at the same level of the hierarchy as the
> keyref itself.
> I'm having great trouble constructing an example that illustrates this
> distinction. Was there some use-case that motivated using the more complex
> rule? 
> Here's an attempt: SECTIONs contain SECTIONs, DEFINITIONs, and TERMREFs.
> There's a key on SECTION specifying selector xpath="DEFINITION" field
> xpath="@term", and there's a keyref on SECTION specifying selector
> xpath="TERMREF" field xpath=".". This means that DEFINITIONs are required to
> be unique only if they are siblings: they can conflict with cousins or

I'm with you here, the uniqueness per qualified-nodeset is covered by
4.2.2; the uniqueness which is used for keyref resolution is evaluated
by the merging of node-tables - so at an other level.

> nephews. A TERMREF must match a DEFINITION anywhere (at any depth) in the
> immediately containing SECTION. It isn't allowed to match two different
> DEFINITIONs (of the same term), unless one of them is an immediate sibling
> of the TERMREF, in which case it doesn't matter how many other matching
> DEFINITIONs there are.
> Have I got that right? 

Here, I hope you didn't get it right - otherwise I would need (again)
to fix Libxml2's IDC code. I hope that this sibling DEFINITION is
meant to be included in the "conflict resolution" mechanism.

I understood the "conflict resolution" to be a mechanism to 
ensure that a keyref entry does resolve to a key/unique entry
which has a duplicate in the descendant-or-self axis (note the self
in the axis).

Let me use the term "scope element". Someone invented this term for the
element for which an IDC key/keyref/unique is defined; i.e. the context
element of the IDC xs:selector XPath expression.

A strategy which would follow this assumption could look like
(but only semantically, since this is not streamable):

1. Build the sum of all qualified node sets of all scope elements
  of the referenced IDC key/unique definition in the
  descendant-or-self axis, starting with a scope element
  of the keyref.

2. Remove all nodes with identical key-sequences from this set.

3. Find a match for the key-sequence of a qualified node of the keyref
  in this set.

4. If no match was found then we get an error, since either
  no matching key-sequence existed, or a duplicate existed and
  was removed.

Does this make sense?

A streaming implementation would "bubble" up the node-table entries.
There's a nice posting from Jeni Tennison about IDCs at:

However, after rereading the spec pieces you mentioned, I'm not sure
anymore if my interpretation was correct :-( So, teachers, see my hand

A test case for the lazy ones. The results of the following case
differ from processor to processor (just play with the commented-out
DEFINITIONs of the instance).

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SECTION">
	    <xsd:element ref="SECTION" minOccurs="0"/>
        <xsd:element name="DEFINITION" minOccurs="0" maxOccurs="5">
            <xsd:attribute name="term" type="xsd:string"/>
        <xsd:element name="TERMREF" type="xsd:string" minOccurs="0" maxOccurs="5"/>					

    <xsd:key name="defKey">
      <xsd:selector xpath="DEFINITION"/>
      <xsd:field xpath="@term"/>

    <xsd:keyref name="termRef" refer="defKey">
      <xsd:selector xpath="TERMREF"/>
      <xsd:field xpath="."/>

<SECTION xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	  <!--DEFINITION term="zappa"/-->
    <DEFINITION term="zappa"/>
  <!--DEFINITION term="zappa"/-->

Just one result:
Xerces-J is not happy with this instance, but gets happy if we
uncomment the first (in document order) DEFINITION and
comment-out the other DEFINITIONs. Dunno what's happening here.

I noticed that I have to lay hands on Libxml2's implementation
anyway, as its "bubbling" mechanism seems to interfere with
evaluation of uniqueness of IDC keys in such recursive
If we uncomment only the first two DEFINITIONs, I get:

Element 'DEFINITION': Duplicate key-sequence ['zappa'] in
key identity-constraint 'defKey'.

This is due to: when the 3rd SECTION is finished, it bubbles
up its node-table to the 2nd SECTION, thus we have 1 entry for
"zappa" there. Now we hit the 2nd DEFINITION, which wants to
add its "zappa" in the node-table of the 2nd SECTION as well,
and we get a uniqueness violation. It seems I cannot use the
node-table for evaluation of uniqueness that easily. Pity.


Received on Monday, 26 September 2005 18:52:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:56:08 UTC