RE: QName is ambiguous; aren't datatypes unambiguous? union types total? from noah_mendelsohn@us.ibm.com on 2002-08-09 (www-rdf-comments@w3.org from July to September 2002)

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 8 Aug 2002 23:35:04 -0400
To: "Ashok Malhotra" <ashokma@microsoft.com>
Cc: "Dan Connolly" <connolly@w3.org>, www-rdf-comments@w3.org, www-xml-schema-comments@w3.org
Message-ID: <OFA17ADF38.88D0A7E6-ON85256C0F.0053CA1F@lotus.com>
I see where you're coming from, Dan, but I suspect the horse has already 
left the barn on this one.  A few comments on what you've written:

Dan Connolly writes:

>>  But on careful review, I don't see 
>> that anywhere in the spec. I see stuff like
>> "Each value in the value space of a datatype 
>> is denoted by one or more
>> literals in its *lexical space*. "

I'm not sure we can change this retroactively, even if it were desireable. 
  I'm curious, how would values of IDREF fit into this world (we can 
probably cheat on that since validation of IRDEF against IDs is actually 
done in Part 1, sort of.)

>> I suggest that the lexical form of QNames
>> should be considered to include the relevant
>> namespace name; that'll make it unambiguous,
>> though it won't correspond exactly to
>> the attribute value.

Section 2.3 says:

"[Definition:]  A lexical space is the set of valid literals for a 
datatype. "

While the term literal is not defined, unfortunately, one of the few 
really clean aspects of the datatype design is that it clearly refers to 
an ordered list of Unicode characters. Furthermore, it's clear in 
structures that those are exactly the characters that are validated as the 
contents of an attribute or element.  I believe your proposal on QNames 
would violate this fundamental invariant, and for that reason among others 
I am strongly opposed.

I think we have to admit that the lexical space for QName is context 
dependent, for better or worse.  Actually, I had always wanted to include 
the pertinent prefix in the value space of QNaming making it a triple not 
a pair.  I lost that one, but I'm not sure that would have dealt with your 
concern in any case.

>> " constraint: the union
>> of (string, decimal) has the decimal 10
>> in its value space, but nothing in its
>> lexical space to denote it.

I'm surprised, and upon review I think you've discovered a contradiction 
in the recommendation.  In sections such as 2.3.1 it says things like: 

[Definition:]  A canonical lexical representation is a set of literals 
from among the valid set of literals for a datatype such that there is a 
one-to-one mapping between literals in the canonical lexical 
representation and values in the ·value space·. 

implying that there must be at least one lexical form for every value.  On 
the other hand, the definition of union is:

[Definition:]  Union datatypes are those whose ·value space·s and ·lexical 
space·s are the union of the ·value space·s and ·lexical space·s of one or 
more other datatypes. 

and section 2.5.1.3 says:

[Definition:]   The datatypes that participate in the definition of a 
·union· datatype are known as the memberTypes of that ·union· datatype. 

The order in which the ·memberTypes· are specified in the definition (that 
is, the order of the <simpleType> children of the <union> element, or the 
order of the QNames in the memberTypes attribute) is significant. During 
validation, an element or attribute's value is validated against the 
·memberTypes· in the order in which they appear in the definition until a 
match is found. The evaluation order can be overridden with the use of 
xsi:type. 

So, the rec. seems contradictory to me. 

My preferred resolution would be different than yours, I think.  I think 
we should workb backwards from the validation rules, make clear that order 
matters, and that in your example the decimal 10 is NOT in the value space 
of the union.   So the value space of a union would be the values 
corresponding to lexical forms that validate per the order sensitive rule. 
 Thus, neither the value spaces nor the lexical spaces can be a union. 
Actually, I think it's clear that the lexical spaces can't be a union, 
since the form "10" would appear twice, which seems wrong to me.



------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







"Ashok Malhotra" <ashokma@microsoft.com>
Sent by: www-xml-schema-comments-request@w3.org
08/02/2002 02:22 PM

 
        To:     "Dan Connolly" <connolly@w3.org>, <www-xml-schema-comments@w3.org>
        cc:     <www-rdf-comments@w3.org>, (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        RE: QName is ambiguous; aren't datatypes unambiguous? union types total?



Dan:
The datatypes spec does not quite say "every legal lexical form for a
datatype denotes a single value in the value of that datatype."
We should consider adding such wording when we rewrite for Schema 1.1.
We should then, carefully, address the exceptions that you point out. 

All the best, Ashok 
 



-----Original Message-----
From: Dan Connolly [mailto:connolly@w3.org] 
Sent: Thursday, August 01, 2002 10:32 PM
To: www-xml-schema-comments@w3.org
Cc: www-rdf-comments@w3.org
Subject: QName is ambiguous; aren't datatypes unambiguous? union types
total?


Consider:

<aDoc>
  <eltA xmlns:x="http://example/vocab1#"
    aQNameAttr="x:n"/>
  <eltB xmlns:x="http://example/vocab2#"
    aQNameAttr="x:n"/>
</aDoc>

Suppose we look at that document using
a schema that says aQNameAttr has
type QName (in both cases). According to

http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#QName

there's a value from eltA; i.e. the pair
 (http://example/vocab1#, "n")
but the value from eltB is the pair
 (http://example/vocab2#, "n")

while their lexical forms are the same
in both cases: x:n.

I thought that a fundamental property of datatypes
was that they're unambiguous; i.e. for
any datatype, there's exactly one value that
corresponds to each item from the lexical space.
The designs that the RDF Core WG is considering
for using XML Schema datatypes in RDF depend
on this property.

But on careful review, I don't see that anywhere
in the spec. I see stuff like
"Each value in the value space of a datatype is denoted by one or more
literals in its *lexical space*. "

But I don't see "each literal in the lexical
space of a datatype denotes exactly one value."
That should be in there somewhere, no?

I suggest that the lexical form of QNames
should be considered to include the relevant
namespace name; that'll make it unambiguous,
though it won't correspond exactly to
the attribute value.

QName is certainly a special case w.r.t.
using XML Schema datatypes in RDF.

Hmm... but I guess union datatypes are too.
On the other hand, union datatypes don't even
obey the "Each value in the value space of a datatype
is denoted by one or more literals in
its *lexical space*. " constraint: the union
of (string, decimal) has the decimal 10
in its value space, but nothing in its
lexical space to denote it.


-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
see you in Montreal in August at Extreme Markup 2002?
Received on Thursday, 8 August 2002 23:36:47 UTC