- From: Kasimier Buchcik <K.Buchcik@4commerce.de>
- Date: Mon, 26 Sep 2005 20:47:57 +0200
- To: Michael Kay <mike@saxonica.com>
- Cc: XML-SCHEMA <xmlschema-dev@w3.org>
Hi,
On Mon, 2005-09-26 at 14:58 +0100, Michael Kay wrote:
> I'm trying (once again) to understand the complexities of the "node-table"
> concept in defining the rules for key-keyref. (Schema Part 1, 3.11.5). In
> particular this provision:
>
> provided no two entries have the same .key-sequence. but distinct nodes.
> Potential conflicts are resolved by not including any conflicting entries
> which would have owed their inclusion to clause 1 above. Note that if all
> the conflicting entries arose under clause 1 above, this means no entry at
> all will appear for the offending .key-sequence..
>
> At present Saxon implements the rather simpler rule: (informally) it's an
> error if a keyref matches more than one node with the same key. The actual
> rule seems to be that it's an error if a keyref matches more than one key,
> unless one of these keys appears at the same level of the hierarchy as the
> keyref itself.
>
> I'm having great trouble constructing an example that illustrates this
> distinction. Was there some use-case that motivated using the more complex
> rule?
>
> Here's an attempt: SECTIONs contain SECTIONs, DEFINITIONs, and TERMREFs.
> There's a key on SECTION specifying selector xpath="DEFINITION" field
> xpath="@term", and there's a keyref on SECTION specifying selector
> xpath="TERMREF" field xpath=".". This means that DEFINITIONs are required to
> be unique only if they are siblings: they can conflict with cousins or
I'm with you here, the uniqueness per qualified-nodeset is covered by
4.2.2; the uniqueness which is used for keyref resolution is evaluated
by the merging of node-tables - so at an other level.
> nephews. A TERMREF must match a DEFINITION anywhere (at any depth) in the
> immediately containing SECTION. It isn't allowed to match two different
> DEFINITIONs (of the same term), unless one of them is an immediate sibling
> of the TERMREF, in which case it doesn't matter how many other matching
> DEFINITIONs there are.
>
> Have I got that right?
Here, I hope you didn't get it right - otherwise I would need (again)
to fix Libxml2's IDC code. I hope that this sibling DEFINITION is
meant to be included in the "conflict resolution" mechanism.
I understood the "conflict resolution" to be a mechanism to
ensure that a keyref entry does resolve to a key/unique entry
which has a duplicate in the descendant-or-self axis (note the self
in the axis).
Let me use the term "scope element". Someone invented this term for the
element for which an IDC key/keyref/unique is defined; i.e. the context
element of the IDC xs:selector XPath expression.
A strategy which would follow this assumption could look like
(but only semantically, since this is not streamable):
1. Build the sum of all qualified node sets of all scope elements
of the referenced IDC key/unique definition in the
descendant-or-self axis, starting with a scope element
of the keyref.
2. Remove all nodes with identical key-sequences from this set.
3. Find a match for the key-sequence of a qualified node of the keyref
in this set.
4. If no match was found then we get an error, since either
no matching key-sequence existed, or a duplicate existed and
was removed.
Does this make sense?
A streaming implementation would "bubble" up the node-table entries.
There's a nice posting from Jeni Tennison about IDCs at:
http://lists.w3.org/Archives/Public/xmlschema-dev/2001Nov/0070.html.
However, after rereading the spec pieces you mentioned, I'm not sure
anymore if my interpretation was correct :-( So, teachers, see my hand
rising.
A test case for the lazy ones. The results of the following case
differ from processor to processor (just play with the commented-out
DEFINITIONs of the instance).
key.xsd
-------
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="SECTION">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="SECTION" minOccurs="0"/>
<xsd:element name="DEFINITION" minOccurs="0" maxOccurs="5">
<xsd:complexType>
<xsd:attribute name="term" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="TERMREF" type="xsd:string" minOccurs="0" maxOccurs="5"/>
</xsd:sequence>
</xsd:complexType>
<xsd:key name="defKey">
<xsd:selector xpath="DEFINITION"/>
<xsd:field xpath="@term"/>
</xsd:key>
<xsd:keyref name="termRef" refer="defKey">
<xsd:selector xpath="TERMREF"/>
<xsd:field xpath="."/>
</xsd:keyref>
</xsd:element>
</xsd:schema>
key.xml
-------
<SECTION xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="key.xsd">
<SECTION>
<SECTION>
<!--DEFINITION term="zappa"/-->
</SECTION>
<DEFINITION term="zappa"/>
</SECTION>
<!--DEFINITION term="zappa"/-->
<TERMREF>zappa</TERMREF>
</SECTION>
Just one result:
Xerces-J is not happy with this instance, but gets happy if we
uncomment the first (in document order) DEFINITION and
comment-out the other DEFINITIONs. Dunno what's happening here.
I noticed that I have to lay hands on Libxml2's implementation
anyway, as its "bubbling" mechanism seems to interfere with
evaluation of uniqueness of IDC keys in such recursive
structures:
If we uncomment only the first two DEFINITIONs, I get:
Element 'DEFINITION': Duplicate key-sequence ['zappa'] in
key identity-constraint 'defKey'.
This is due to: when the 3rd SECTION is finished, it bubbles
up its node-table to the 2nd SECTION, thus we have 1 entry for
"zappa" there. Now we hit the 2nd DEFINITION, which wants to
add its "zappa" in the node-table of the 2nd SECTION as well,
and we get a uniqueness violation. It seems I cannot use the
node-table for evaluation of uniqueness that easily. Pity.
Regards,
Kasimier
Received on Monday, 26 September 2005 18:52:37 UTC