RE: "RE: Including schemata with duplicate referents"' from Michael Kay on 2004-11-05 (xmlschema-dev@w3.org from November 2004)

From: Michael Kay <mike@saxonica.com>
Date: Fri, 5 Nov 2004 17:39:52 -0000
To: "'Kasimier Buchcik'" <kbuchcik@4commerce.de>
Cc: <xmlschema-dev@w3.org>
Message-Id: <E1CQ845-0005b0-00@ukmail1.eechost.net>

> 
> Assuming non-cameleon includes and no copies:
> 
> - B1.doc, B2.doc, C1.doc, C2.doc are schema _documents_
> - C1.doc and C2.doc define the same set of components
> - B1.doc <includes> C1.doc
> - B2.doc <includes> C2.doc

Ah, I see. It boils down to what you mean by "the same set of components" I
think you either have to establish that the components in C1 and C2 really
are "the same", or you throw an error saying A can't have two different
components with the same name.

The spec is pretty fuzzy about the rules for deciding when two components
are identical. It says this at 3.4.6:

<quote>
The wording of clause 2.1 above appeals to a notion of component identity
which is only incompletely defined by this version of this specification. In
some cases, the wording of this specification does make clear the rules for
component identity. These cases include:

    * When they are both top-level components with the same component type,
namespace name, and local name;
    * When they are necessarily the same type definition (for example, when
the two types definitions in question are the type definitions associated
with two attribute or element declarations, which are discovered to be the
same declaration);
    * When they are the same by construction (for example, when an element's
type definition defaults to being the same type definition as that of its
substitution-group head or when a complex type definition inherits an
attribute declaration from its base type definition).

In other cases two conforming implementations may disagree as to whether
components are identical.
</quote>

The case in question is covered by the first bullet, which claims that the
spec defines clear rules for component identity in the case of top-level
components (a term whose meaning we can reasonably guess, though it is
nowhere defined). It's a shame it doesn't reference these rules, because I
can't find them.

It does have a definition of "equality" of components in 3.1.1, which might
be what it is referring to. But it's a curious definition: it seems to say
that two components are equal if they have the same name in the same symbol
space, but since it has just said a couple of sentences earlier that you
can't have two components [it actually says "copies of components", but I've
no idea what "copies" means] with the same name in the same symbol space,
this simply seems to be saying that distinct components are never equal,
which still leaves the question as to what "distinct" means. 

I've been working on the basis that components are identical if and only if
they come from the same place in the same schema document, and that if two
non-identical components have the same name (as in your example) then it's
an error. If someone can point me to something better, please do!

The thing I find exasperating about all this is that behind all the formal
language in this spec, some basic concepts are very poorly defined.

Michael Kay

Received on Friday, 5 November 2004 17:39:57 UTC