Re: Datatyping literals: question and test cases

[Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com]


----- Original Message ----- 
From: "ext Brian McBride" <bwm@hplb.hpl.hp.com>
To: "Patrick Stickler" <patrick.stickler@nokia.com>; "Graham Klyne" <GK@NineByNine.org>; "ext pat hayes" <phayes@ai.uwf.edu>
Cc: <w3c-rdfcore-wg@w3.org>
Sent: 31 October, 2002 11:33
Subject: Re: Datatyping literals: question and test cases


> At 09:29 31/10/2002 +0200, Patrick Stickler wrote:
> 
> 
> 
> >[Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, 
> >patrick.stickler@nokia.com]
> >
> > > >But, what about this:
> > > >
> > > >   _:x ex:prop "http://example.org/" .
> > > >   ex:prop rdfs:range xsd:anyURI .
> > >
> > > That depends on how xsd:anyURI defines its value space. If its the
> > > set of strings conforming to the URI syntax, then OK.
> >
> >ARGH! Pat, no!
> >The WG has decided (with Nokia's dissent) that inlined literals
> >denote their string components (i.e. string-semantics)
> 
> No it has not.  Their has been some recent discussion on whether they 
> denote (string,lang) pairs or the union of strings and (string, lang) 
> pairs.  To the best of my recollection, denoting just strings is not a 
> current option.
> 
> >  and therefore
> >the above range assertion is a datatype clash! It is *never* OK.
> >Never ever ever. It is not possible for it to ever be OK.
> 
> Why not?  Surely that depends on whether the value spaces of literals and 
> anyURI's are disjoint.  If literals can denote strings (as you claim they 
> can), and anyURI's are strings, then this is not necessarily a datatype clash.
> [...]

I would assert that anyURI values are not strings, but URIs. I don't
believe that anyone expects that the URI value space is ordered, i.e.
that it is meaningful to ask if URI X is "less than" URI Y, only whether
they are equal. Furthermore, equality between values of xsd:anyURI is
not the same as string equality. I.e.

   "HTTP://EXAMPLE.COM/"^^xsd:anyURI == "http://example.com/"^^xsd:anyURI

This is no differen than the case of 

   "10"^^xsd:integer == "000000010.0000000"^^xsd:integer

Yet

   "HTTP://EXAMPLE.COM/"^^xsd:string != "http://example.com/"^^xsd:string

> >Furthermore, not that xsd:anyURI is *not* a subtype even of xsd:string!
> >I.e., even XML Schema says that a member of the value space of xsd:anyURI
> >is *not* a member of the value space of xsd:string -- a URI is not a string!
> 
> A reference would be helpful here.  

http://www.w3.org/TR/xmlschema-2/#built-in-primitive-datatypes

> Where does  XSD say that different 
> primitive types must have disjoint value spaces.

It doesn't. Derived types have a value space that is a proper subset
of the value space of all superordinate types. Some primitive types
have intersecting value spaces. Some have disjoint value spaces.

Not every datatype defined by XML Schema is a subtype of xsd:string.
Nor is every datatype used by RDF a subtype of rdfs:Literal or 
rdfs:StringLiteral.

I.e. xsd:anyURI is not a subtype of xsd:string. Their value spaces are 
thus disjunct.

Clearly there was a semantic reason why xsd:anyURI is not a subtype
of xsd:string, as the above examples regarding ordinal and equality
comparison is concerned show, since it would be a trivial declaration 
to relate them hierarchically and thus relate their value spaces.

> It is true that anyURI is not derived from xsd:string, but is that because 
> the value spaces are different, or because there is no facet that would 
> allow exact expression of the value space of anyURI?  The only facet that 
> might be used is PATTERN which requires a regular expression.  If the 
> grammar for anyURI cannot be expressed as regular expression, then anyURI 
> cannot be derived from xsd:string.

It is possible to define a regular expression that will exclude strings
that are not valid URIs, specifically covering the two cases of 
non-hierarchical and hierarchical URIs, requiring and restricting the
form of the URI scheme prefix. And in fact, the lexical grammar for
URIs *is* defined, by RFC 2396.

Then, defining particular URI schemes as subtypes of anyURI (as I do)
one can then constrain further the allowed lexical forms for the
specific scheme.

The fact that XML Schema does not specify the regular expressions
explicitly does not mean that such regular expressions are not
indicated by the XML Schema spec, as it explicitly references RFC 2396
and RFC 2732 regarding both syntax of lexical forms and semantics
of xsd:anyURI.

But the reason why xsd:anyURI is not derived from xsd:string is
clearly semantic, since the lexical space of xsd:anyURI is certainly
a subset of the lexical space of xsd:string. Yet rdfs:subClassOf
with regards to RDF datatyping does
not relate lexical spaces, it relates value spaces, and just because
one datatype's lexical space might be a subset of another's lexical
space, and just because both datatypes's L2V mapping may map a lexical 
form to a value that is string-equal to the lexical form, does not mean
that the value spaces intersect. 

Thus, as XML Schema clearly specifies

   "urn:foo/bar"^^xsd:anyURI != "urn:foo/bar"^^xsd:string

And I agree that this is correct. It makes no sense to compare a URI
as a datatype value with a string as a datatype value as they have
different semantics. Comparing the lexical form of a URI and the lexical
form of a string is possible, though I wouldn't think necessarily useful.

And in the same way that xsd:anyURI and xsd:string are disjunct, and
for the same reasons, rdfs:StringLiteral and xsd:string should be
disjunct, since it is no more meaningful to test whether one inlined
literal (which is simply a global constant, remember) is "less then"
or "greater than" some other literal. The semantics of the value space
do not IMO define such ordered comparisons. And thus the following
must be true and valid

   "foo"^^rdfs:StringLiteral !=  "foo"^^xsd:string


Patrick

Received on Thursday, 31 October 2002 05:50:20 UTC