Re: How to scope the note about D and override(E,D) from Henry S. Thompson on 2011-03-14 (www-xml-schema-comments@w3.org from January to March 2011)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Mon, 14 Mar 2011 18:26:02 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: www-xml-schema-comments@w3.org
Message-ID: <f5blj0hy2x1.fsf@calexico.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

C. M. Sperberg-McQueen writes:

> Does this work in the general case?  I would expect that it would not
> suffice to know that type T is being overridden, that it would be
> necessary to know what the new definition is.

I guess I'm _proposing_ that we insist on source-identity as the only
basis for allowing 'multiple' definition/declaration, as we at
least allow 1.0 processors to do in the perfectly ordinary case of 

<element name="foo"/>
<element name="foo"/>

which 1.0 processors may (must?) treat as an error.

>> This is because a marker is just a tuple of strings, and so can be
>> compared for equality with another marker without invoking any theory
>> of component identity.
>
> I don't see any appeal to component identity in the current
> design.  At most there is an appeal to element equivalence.

I'm trying to avoid requiring the use of deep-equal as well, yes.

>> 
>> O exploits this in two ways:
>> 
>> 1) If an override needs to be processed whose target has already been
>>    processed with superset of the required markers, the override can
>>    be ignored;
>
> I think there are two problems with this behavior.
>
> 1 If documents A and B each override C, with elements E1 and E2
> respectively, and E1 and E2 provide declarations for type 
> T (and nothing else), the rule you just stated suggests we can ignore
> the override of C by B, if we saw A first.  And we can ignore the
> override of C by A, if we saw B first.

Not so.  The markers will be
  <'T','complexType',[A]/base-URI(),[A]//E1/complexType[@name='T']/position()>
and 
  <'T','complexType',[B]/base-URI(),[B]//E2/complexType[@name='T']/position()>
respectively, which don't overlap at all, so no elimination is allowed.
>
> 2 Whenever the declarations E1 and E2 provide for T are in
> conflict, the rule just stated resolves the conflict in favor of one
> or the other.  I think the correct answer (certainly, the answer we
> chose in our phase-1 discussions of bug 6021) is that an error
> should result.

Again, not so.  And indeed an error does result, either sooner (see
step 5 of Algorithm O) or later (at schema construction time, in the
usual way).

>> 
>> 2) If a given schema document D is involved via more than one path of
>>    overrides, actually constructing schema(D) can be done efficiently
>>    by taking the union of all the markers which apply to it.
>
> That suggests that if A and B each override C, with E1 providing a 
> new declaration of type T1 and E2 providing a new declaration of T2,
> then the result should be the same as if they agreed and both
> E1 and E2 overrode both T1 and T2, in the same way.  
>
> That would be a dramatic change from the status quo design, in 
> which A is taken to want C's original declaration of T2 and B is
> taken to want C's original declaration of T1.

Sure, in isolation, but the merger would only happen if some D had
included both A and B.  What I was describing is the construction of
schema(R), for R the starting point of the whole exercise.  The only
way A & B could both be involved is if they were both part of the
target set of R, in which case we _do_ want both E1's override of T1
and E2's override of T2 in the result.

> I don't see the motivation for such a rule.
>
> It's possible that the paraphrases you've just given glide over some
> details of the algorithm and that the problems lie not in the algorithm
> but in the paraphrases.

I don't think so -- I guess from this:

> Does this work in the general case?  I would expect that it would not
> suffice to know that type T is being overridden, that it would be
> necessary to know what the new definition is.

perhaps you read my reference to SCDs differently from the way I had
intended -- at the very least I should have been clear that I meant
_absolute_ SCDs.  Maybe the SCD analogy was unhelpful, insofar as we
don't _have_ SCDs for the children of <override> elements, and perhaps
we might end up saying we _can't_ have them.  So, try reading again
with the actual definition of marker, namely one each of the form

 <d/@name,local-name(d),d/base-uri(),d::parent/position()>

for each d a child of an <override>

> I will try to study the algorithm later, but so far I'm stuck on the
> first sentence, which says that our goal is to construct a tree with
> no duplicate leaves.  The implicit suggestion that a tree might have
> duplicate leaves persuades me that I must be missing something here
> -- either 'tree' or 'duplicate' must mean something I don't
> understand.

Trees can certainly have leaves with duplicate labels.  XML data
models and natural language parse trees have them all the time.

That's all I meant.  Since my labels are also recipes for the creation
of modified schema documents == appeals to F.2 == override(E,D) for
some (possibly synthetic) E and D, I want to avoid spurious
duplication, and that means no duplicate leaves, or, if you prefer, no
leaves with duplicate labels.

ht
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFNfl26kjnJixAXWBoRArQYAJ0Yajv+I/hPoVYVhgEihSHbpad2bwCfdMH9
2dXhqY3s13sfemCyqKJsT+g=
=Aqw/
-----END PGP SIGNATURE-----
Received on Monday, 14 March 2011 18:26:37 UTC