- From: Kasimier Buchcik <K.Buchcik@4commerce.de>
- Date: Mon, 26 Sep 2005 20:47:57 +0200
- To: Michael Kay <mike@saxonica.com>
- Cc: XML-SCHEMA <xmlschema-dev@w3.org>
Hi, On Mon, 2005-09-26 at 14:58 +0100, Michael Kay wrote: > I'm trying (once again) to understand the complexities of the "node-table" > concept in defining the rules for key-keyref. (Schema Part 1, 3.11.5). In > particular this provision: > > provided no two entries have the same .key-sequence. but distinct nodes. > Potential conflicts are resolved by not including any conflicting entries > which would have owed their inclusion to clause 1 above. Note that if all > the conflicting entries arose under clause 1 above, this means no entry at > all will appear for the offending .key-sequence.. > > At present Saxon implements the rather simpler rule: (informally) it's an > error if a keyref matches more than one node with the same key. The actual > rule seems to be that it's an error if a keyref matches more than one key, > unless one of these keys appears at the same level of the hierarchy as the > keyref itself. > > I'm having great trouble constructing an example that illustrates this > distinction. Was there some use-case that motivated using the more complex > rule? > > Here's an attempt: SECTIONs contain SECTIONs, DEFINITIONs, and TERMREFs. > There's a key on SECTION specifying selector xpath="DEFINITION" field > xpath="@term", and there's a keyref on SECTION specifying selector > xpath="TERMREF" field xpath=".". This means that DEFINITIONs are required to > be unique only if they are siblings: they can conflict with cousins or I'm with you here, the uniqueness per qualified-nodeset is covered by 4.2.2; the uniqueness which is used for keyref resolution is evaluated by the merging of node-tables - so at an other level. > nephews. A TERMREF must match a DEFINITION anywhere (at any depth) in the > immediately containing SECTION. It isn't allowed to match two different > DEFINITIONs (of the same term), unless one of them is an immediate sibling > of the TERMREF, in which case it doesn't matter how many other matching > DEFINITIONs there are. > > Have I got that right? Here, I hope you didn't get it right - otherwise I would need (again) to fix Libxml2's IDC code. I hope that this sibling DEFINITION is meant to be included in the "conflict resolution" mechanism. I understood the "conflict resolution" to be a mechanism to ensure that a keyref entry does resolve to a key/unique entry which has a duplicate in the descendant-or-self axis (note the self in the axis). Let me use the term "scope element". Someone invented this term for the element for which an IDC key/keyref/unique is defined; i.e. the context element of the IDC xs:selector XPath expression. A strategy which would follow this assumption could look like (but only semantically, since this is not streamable): 1. Build the sum of all qualified node sets of all scope elements of the referenced IDC key/unique definition in the descendant-or-self axis, starting with a scope element of the keyref. 2. Remove all nodes with identical key-sequences from this set. 3. Find a match for the key-sequence of a qualified node of the keyref in this set. 4. If no match was found then we get an error, since either no matching key-sequence existed, or a duplicate existed and was removed. Does this make sense? A streaming implementation would "bubble" up the node-table entries. There's a nice posting from Jeni Tennison about IDCs at: http://lists.w3.org/Archives/Public/xmlschema-dev/2001Nov/0070.html. However, after rereading the spec pieces you mentioned, I'm not sure anymore if my interpretation was correct :-( So, teachers, see my hand rising. A test case for the lazy ones. The results of the following case differ from processor to processor (just play with the commented-out DEFINITIONs of the instance). key.xsd ------- <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="SECTION"> <xsd:complexType> <xsd:sequence> <xsd:element ref="SECTION" minOccurs="0"/> <xsd:element name="DEFINITION" minOccurs="0" maxOccurs="5"> <xsd:complexType> <xsd:attribute name="term" type="xsd:string"/> </xsd:complexType> </xsd:element> <xsd:element name="TERMREF" type="xsd:string" minOccurs="0" maxOccurs="5"/> </xsd:sequence> </xsd:complexType> <xsd:key name="defKey"> <xsd:selector xpath="DEFINITION"/> <xsd:field xpath="@term"/> </xsd:key> <xsd:keyref name="termRef" refer="defKey"> <xsd:selector xpath="TERMREF"/> <xsd:field xpath="."/> </xsd:keyref> </xsd:element> </xsd:schema> key.xml ------- <SECTION xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="key.xsd"> <SECTION> <SECTION> <!--DEFINITION term="zappa"/--> </SECTION> <DEFINITION term="zappa"/> </SECTION> <!--DEFINITION term="zappa"/--> <TERMREF>zappa</TERMREF> </SECTION> Just one result: Xerces-J is not happy with this instance, but gets happy if we uncomment the first (in document order) DEFINITION and comment-out the other DEFINITIONs. Dunno what's happening here. I noticed that I have to lay hands on Libxml2's implementation anyway, as its "bubbling" mechanism seems to interfere with evaluation of uniqueness of IDC keys in such recursive structures: If we uncomment only the first two DEFINITIONs, I get: Element 'DEFINITION': Duplicate key-sequence ['zappa'] in key identity-constraint 'defKey'. This is due to: when the 3rd SECTION is finished, it bubbles up its node-table to the 2nd SECTION, thus we have 1 entry for "zappa" there. Now we hit the 2nd DEFINITION, which wants to add its "zappa" in the node-table of the 2nd SECTION as well, and we get a uniqueness violation. It seems I cannot use the node-table for evaluation of uniqueness that easily. Pity. Regards, Kasimier
Received on Monday, 26 September 2005 18:52:37 UTC