Re: [Bug 11076] Element Declarations Consistent: comparing type tables from C. M. Sperberg-McQueen on 2011-03-08 (www-xml-schema-comments@w3.org from January to March 2011)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 8 Mar 2011 08:42:12 -0700
To: www-xml-schema-comments@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <F53871A8-9F90-4138-A709-6E200C30E38A@blackmesatech.com>
On Mar 7, 2011, at 4:55 PM, bugzilla@jessica.w3.org wrote:

> http://www.w3.org/Bugs/Public/show_bug.cgi?id=11076
> 
> --- Comment #4 from Michael Kay <mike@saxonica.com> 2011-03-07 23:55:25 UTC ---
> In the proposal, the phrase "and T1.{type definition} and T2.{type definition}
> are valid for the same set of input element information items." seems curious -
> I don't think a type definition can be valid for some information items and
> invalid for others.
> 
> But rather than fix this in the obvious way, I don't think it's helpful to
> define equivalence using a condition that isn't actually computable by a
> universal Turing machine.

I may be forgetting something simple and obvious, but I don't think 
I'm familiar with a result that says the equivalence of two XSD types 
is non-computable (as opposed to expensive); can you expound?

The difficulty of comparing two types or two type alternatives for 
equivalence does seem to me to be a good reason for not requiring
processors to calculate it, even if it is computable.  But is it a sufficient
reason for not defining equivalence in the obvious way?

> Better to stick with the mechanistic definition, and
> allow processors to relax it if they are able to.

Under what circumstances should processors be allowed to recognize
two types or two type alternatives as equivalent?

I think the simplest answer is:  when in fact they are equivalent.
If we don't allow ourselves to define equivalence to denote the concept
we need here, because in the general case it's expensive, how 
is the text to make clear the motivation for allowing processors to
apply more intelligent analysis than the simplistic comparison 
spelled out as the minimum?

In some specs, one could simply say "types T1 and T2 are the
same" and have it be clear and meaningful.  We cannot do that,
because a sufficiently vocal minority of the WG opposed any effort
to define component identity clearly.  As any reader of our mailing
lists can tell, our failure even to attempt to achieve clarity on this
fundamental concept of our spec continues to offend and irritate
me (and I apologize for my repetitive complaints about it).  More
important, our failure continues to exact a price because it makes
it harder to resolve many of the issues we face.

If we aren't to throw up our hands, say "This is a hopeless mess,"
and give up on 1.1, we need to find some way to address issues 
like this one without appealing to identity.

In cases where one cannot appeal, or does not wish to appeal, to
identity, it is usual to appeal instead to isomorphism or equivalence
of one form or another.  I think we would do well to try to work out
suitable notions of equivalence whenever they can help us solve
issues, rather than leaving the issues unresolved or resolving them 
in an unsatisfactory way by insisting that we deal only in 
identity.  It seemed to me particularly helpful here, in delineating
just how to tell whether a particular rule in an implementation is 
safe or not.

If we can't appeal to equivalence, either, because our language 
is too complex for equivalence to be calculated simply, then we 
have really painted ourselves into an untenable corner.

> I would suggest wording along
> the lines:
> 
> T1 and T2 are equivalent if all the following conditions are true:
> 
> ...

If this is intended as a definition of equivalence, I think it's a 
waste of a perfectly good term:  the property defined does
guarantee equivalence in type alternatives, but it is much much
narrower and it borders on misleading to call it by the name
'equivalence'.

> 
> A processor MAY also treat T1 and T2 as equivalent in cases where not all the
> above conditions are true, provided it can determine that that T1.{test} and
> T2.{test} will always evaluate to the same result for any possible element
> information item.

This seems to be substantively the same as the current
appeal to equivalence of the {test} property, with the main
differences being that the concept of equivalence is 
inlined instead of being called out as important, and that
instead of allowing processors to vary in how aggressively they
can recognize two types as equivalent, we allow them to
vary in what they regard as "the same type" and what they
regard as "different types" -- which is *definitely* not computable
in the general case, because in the general case our spec
does not provide an answer.

Is there a reason not to simplify this passage by saying
that a processor may treat T1 and T2 as equivalent if a
random number generator picks an even number between
one and ten three times in a row?  Which rule is simpler to
understand?  Easier to calculate?  More likely to provide
interoperability?  Easier to justify?

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Tuesday, 8 March 2011 15:42:43 UTC