Re: "RE: Including schemata with duplicate referents"' from Kasimier Buchcik on 2004-11-08 (xmlschema-dev@w3.org from November 2004)

From: Kasimier Buchcik <kbuchcik@4commerce.de>
Date: Mon, 08 Nov 2004 15:57:03 +0100
To: Michael Kay <mike@saxonica.com>
CC: xmlschema-dev@w3.org
Message-ID: <418F893F.70906@4commerce.de>
Hi,

sorry for the delay... I hate the weekend being in-between messages.

Michael Kay wrote:
>>Assuming non-cameleon includes and no copies:
>>
>>- B1.doc, B2.doc, C1.doc, C2.doc are schema _documents_
>>- C1.doc and C2.doc define the same set of components
>>- B1.doc <includes> C1.doc
>>- B2.doc <includes> C2.doc
> 
> 
> Ah, I see. It boils down to what you mean by "the same set of components" I
> think you either have to establish that the components in C1 and C2 really
> are "the same", or you throw an error saying A can't have two different
> components with the same name.

Which would include component structure identity checks; I'm really
scared of this :-/ Additionally if the indentity checks for C1 and C2
are performed at the stage of inclusion of B1 and B2 into A, I would
need to remap references to C2 to C1. I don't know if things would stay
consistent if B2 still would reference C2 components.

> The spec is pretty fuzzy about the rules for deciding when two components
> are identical. It says this at 3.4.6:
> 
> <quote>
> The wording of clause 2.1 above appeals to a notion of component identity
> which is only incompletely defined by this version of this specification. In
> some cases, the wording of this specification does make clear the rules for
> component identity. These cases include:
> 
>     * When they are both top-level components with the same component type,
> namespace name, and local name;
>     * When they are necessarily the same type definition (for example, when
> the two types definitions in question are the type definitions associated
> with two attribute or element declarations, which are discovered to be the
> same declaration);
>     * When they are the same by construction (for example, when an element's
> type definition defaults to being the same type definition as that of its
> substitution-group head or when a complex type definition inherits an
> attribute declaration from its base type definition).
> 
> In other cases two conforming implementations may disagree as to whether
> components are identical.
> </quote>
> 
> The case in question is covered by the first bullet, which claims that the
> spec defines clear rules for component identity in the case of top-level
> components (a term whose meaning we can reasonably guess, though it is
> nowhere defined). It's a shame it doesn't reference these rules, because I
> can't find them.
> 
> It does have a definition of "equality" of components in 3.1.1, which might
> be what it is referring to. But it's a curious definition: it seems to say
> that two components are equal if they have the same name in the same symbol
> space, but since it has just said a couple of sentences earlier that you
> can't have two components [it actually says "copies of components", but I've
> no idea what "copies" means] with the same name in the same symbol space,
> this simply seems to be saying that distinct components are never equal,
> which still leaves the question as to what "distinct" means. 
> 
> I've been working on the basis that components are identical if and only if
> they come from the same place in the same schema document, and that if two
> non-identical components have the same name (as in your example) then it's
> an error. If someone can point me to something better, please do!

I would be very happy if this would work. I see the following problem:

The first bullet of the identity definition would be extended to a
triplet: component name, target namespace and document location.
This would depend on a global schema location registry otherwise it
would fail.
Example:
   - A (final), B1 and B2 (incl. by A),
     X1 (incl. by B1) and X2 (incl. by B2)
     are schema documents
   - X1 and X2 are schema documents defining XML
   - A, B1 and X1 are on a local drive
   - B2 and X2 are on a spanish server
   X1 and X2 are indentical but would fail to be included, since
   coming from different schema document locations

With the current spec, it seems component structure indentity checks are
needed in any case, to accomplish modularity :-( Otherwise the schema
author would need to somehow package the schemata in use to provide
document location distinction.

> The thing I find exasperating about all this is that behind all the formal
> language in this spec, some basic concepts are very poorly defined.

I would really, really like to see the name and target
namespace to be the factor of destinction, nothing else.

If includes are meant to be comparable with cut & paste, I would tend to
disallow inclusion of schemata, which in turn define or include
components with the same name and target namespace. Thus, if two
schemata have an <include> providing a schema location, and those
schemata - resolved by the schema location - contain more than one
component of equal name and target namespace tuple, the schema would not
be valid. In such a scenario, subsequent schema inclusions would
not use any <include>, except for the top schema or an additional
top-include, which would locate the needed includes. This would allow
the place document location bindings on the caller side, rather than
on the called side, which creates all the mess for me currently.

Greetings,

Kasimier
Received on Monday, 8 November 2004 14:58:31 UTC