Re: key-keyref constraints: conflicts in node-table from Kasimier Buchcik on 2005-09-26 (xmlschema-dev@w3.org from September 2005)

From: Kasimier Buchcik <K.Buchcik@4commerce.de>
Date: Mon, 26 Sep 2005 20:47:57 +0200
To: Michael Kay <mike@saxonica.com>
Cc: XML-SCHEMA <xmlschema-dev@w3.org>
Message-Id: <1127760477.1272.141.camel@librax>
Hi,

On Mon, 2005-09-26 at 14:58 +0100, Michael Kay wrote:
> I'm trying (once again) to understand the complexities of the "node-table"
> concept in defining the rules for key-keyref. (Schema Part 1, 3.11.5). In
> particular this provision:
> 
> provided no two entries have the same .key-sequence. but distinct nodes.
> Potential conflicts are resolved by not including any conflicting entries
> which would have owed their inclusion to clause 1 above. Note that if all
> the conflicting entries arose under clause 1 above, this means no entry at
> all will appear for the offending .key-sequence..
> 
> At present Saxon implements the rather simpler rule: (informally) it's an
> error if a keyref matches more than one node with the same key. The actual
> rule seems to be that it's an error if a keyref matches more than one key,
> unless one of these keys appears at the same level of the hierarchy as the
> keyref itself.
> 
> I'm having great trouble constructing an example that illustrates this
> distinction. Was there some use-case that motivated using the more complex
> rule? 
> 
> Here's an attempt: SECTIONs contain SECTIONs, DEFINITIONs, and TERMREFs.
> There's a key on SECTION specifying selector xpath="DEFINITION" field
> xpath="@term", and there's a keyref on SECTION specifying selector
> xpath="TERMREF" field xpath=".". This means that DEFINITIONs are required to
> be unique only if they are siblings: they can conflict with cousins or

I'm with you here, the uniqueness per qualified-nodeset is covered by
4.2.2; the uniqueness which is used for keyref resolution is evaluated
by the merging of node-tables - so at an other level.

> nephews. A TERMREF must match a DEFINITION anywhere (at any depth) in the
> immediately containing SECTION. It isn't allowed to match two different
> DEFINITIONs (of the same term), unless one of them is an immediate sibling
> of the TERMREF, in which case it doesn't matter how many other matching
> DEFINITIONs there are.
> 
> Have I got that right? 

Here, I hope you didn't get it right - otherwise I would need (again)
to fix Libxml2's IDC code. I hope that this sibling DEFINITION is
meant to be included in the "conflict resolution" mechanism.

I understood the "conflict resolution" to be a mechanism to 
ensure that a keyref entry does resolve to a key/unique entry
which has a duplicate in the descendant-or-self axis (note the self
in the axis).

Let me use the term "scope element". Someone invented this term for the
element for which an IDC key/keyref/unique is defined; i.e. the context
element of the IDC xs:selector XPath expression.

A strategy which would follow this assumption could look like
(but only semantically, since this is not streamable):

1. Build the sum of all qualified node sets of all scope elements
  of the referenced IDC key/unique definition in the
  descendant-or-self axis, starting with a scope element
  of the keyref.

2. Remove all nodes with identical key-sequences from this set.

3. Find a match for the key-sequence of a qualified node of the keyref
  in this set.

4. If no match was found then we get an error, since either
  no matching key-sequence existed, or a duplicate existed and
  was removed.

Does this make sense?

A streaming implementation would "bubble" up the node-table entries.
There's a nice posting from Jeni Tennison about IDCs at:
http://lists.w3.org/Archives/Public/xmlschema-dev/2001Nov/0070.html.


However, after rereading the spec pieces you mentioned, I'm not sure
anymore if my interpretation was correct :-( So, teachers, see my hand
rising.

A test case for the lazy ones. The results of the following case
differ from processor to processor (just play with the commented-out
DEFINITIONs of the instance).

key.xsd
-------
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="SECTION">
    <xsd:complexType>
	  <xsd:sequence>
	    <xsd:element ref="SECTION" minOccurs="0"/>
        <xsd:element name="DEFINITION" minOccurs="0" maxOccurs="5">
          <xsd:complexType>
            <xsd:attribute name="term" type="xsd:string"/>
          </xsd:complexType>
        </xsd:element>
        <xsd:element name="TERMREF" type="xsd:string" minOccurs="0" maxOccurs="5"/>					
      </xsd:sequence>
    </xsd:complexType>

    <xsd:key name="defKey">
      <xsd:selector xpath="DEFINITION"/>
      <xsd:field xpath="@term"/>
    </xsd:key>

    <xsd:keyref name="termRef" refer="defKey">
      <xsd:selector xpath="TERMREF"/>
      <xsd:field xpath="."/>
    </xsd:keyref>
  </xsd:element>
</xsd:schema>

key.xml
-------
<SECTION xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="key.xsd">
  <SECTION>
    <SECTION>
	  <!--DEFINITION term="zappa"/-->
    </SECTION>
    <DEFINITION term="zappa"/>
  </SECTION>
  <!--DEFINITION term="zappa"/-->
  <TERMREF>zappa</TERMREF>	
</SECTION>

Just one result:
Xerces-J is not happy with this instance, but gets happy if we
uncomment the first (in document order) DEFINITION and
comment-out the other DEFINITIONs. Dunno what's happening here.

I noticed that I have to lay hands on Libxml2's implementation
anyway, as its "bubbling" mechanism seems to interfere with
evaluation of uniqueness of IDC keys in such recursive
structures:
If we uncomment only the first two DEFINITIONs, I get:

Element 'DEFINITION': Duplicate key-sequence ['zappa'] in
key identity-constraint 'defKey'.

This is due to: when the 3rd SECTION is finished, it bubbles
up its node-table to the 2nd SECTION, thus we have 1 entry for
"zappa" there. Now we hit the 2nd DEFINITION, which wants to
add its "zappa" in the node-table of the 2nd SECTION as well,
and we get a uniqueness violation. It seems I cannot use the
node-table for evaluation of uniqueness that easily. Pity.

Regards,

Kasimier
Received on Monday, 26 September 2005 18:52:37 UTC