RE: DATATYPES: mental dump. from Patrick.Stickler@nokia.com on 2001-11-14 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 14 Nov 2001 09:36:34 +0200
To: melnik@db.stanford.edu, w3c-rdfcore-wg@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C089@trebe003.NOE.Nokia.com>
> -----Original Message-----
> From: ext Sergey Melnik [mailto:melnik@db.stanford.edu]
> Sent: 14 November, 2001 00:18
> To: Stickler Patrick (NRC/Tampere)
> Cc: phayes@ai.uwf.edu; w3c-rdfcore-wg@w3.org
> Subject: Re: DATATYPES: mental dump.
> 
> 
> Patrick.Stickler@nokia.com wrote:
> > ...
> 
> Patrick,
> 
> for some reason your most recent X-scheme related posting 
> still did not
> propagate to my mailbox (I found it in the archive of rdf-core).

Don't know what the problem there might be, if there is one...

> Question: do you assume that namespaces are part of the model? For
> example, given a URV like xsd:integer:10, is it possible to 
> extract the
> namespace xsd:integer unambiguously? 

Namespaces are not part of URVs. The similarity between
the URV xsd:integer:10 and the qname xsd:integer (which
isn't even an authoritative identifier) is coicidental,
though understandable, given the mnemonic space.

Have a look at the 'xsd:' URI scheme specification to see
these relations explicitly:

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0358.html


> (My guess is that the 
> URI encoding
> alone is not sufficient, i.e., by looking at
> http://www.w3.org/2001/XMLSchema#integer:10 it is not obvious whether
> the datatype would be http://www.w3.org/2001/XMLSchema#integer or
> http://www.w3.org/2001/XMLSchema#).

Again, it is explicit, per the URV definition. I had sent
those materials to you last Friday, not via the list (just to
the DATATYPE subgroup) so you may need to check our email
service. It appears to be loosing messages.

> Related question: how do you refer to a datatype? Would you use a
> resource URI like http://www.w3.org/2001/XMLSchema#integer to refer to
> integers? What is the nature of the relationship between a 
> resource like
> that and other URV-resources that have the above URI as a prefix?

The URI http://www.w3.org/2001/XMLSchema#integer denotes an XML Schema
simple data type for which is defined both a value space, a lexical
space, and a canonical lexical space (these three things are defined
for all XML Schema simple data types).

The URI xsd:integer denotes a URV scheme which defines a URI encoding
of a typed data literal. It has a lit:mapsTo relation to the XML Schema
data type http://www.w3.org/2001/XMLSchema#integer. Every URV scheme
based on the 'lit:' ontology defines a canonical lexical space (not just 
a basic lexical space) which lit:mapsTo one or more data types (both 
their lexical and value spaces) and/or lit:correspondsTo to one or more 
data types (only their value spaces) or is lit:approximateTo one or
more data types (only their value spaces, and with possible loss of
precision).

The URI xsd:integer:10 denotes a URV which defines a literal which
conforms to the lexical space for the URV scheme xsd:integer and,
based on the defined relations between xsd:integer and XML Schema
http://www.w3.org/2001/XMLSchema#integer (lit:mapsTo), the lexical
form "10" is a member of both the lexical and value space of
http://www.w3.org/2001/XMLSchema#integer.

Is that clearer now?

> Another thing: could you highlight some of the options of encoding the
> interpretation shown in 
> 
>
http://WWW-DB.Stanford.EDU/~melnik/rdf/datatyping/fig/motivating_example.gif
>
> in the X scheme (just the 'compact' version)? 

I had planned to. Will do it today. I know that there needs to be
some good and disperate data type examples. The examples I sent
out yesterday were mostly intended to illustrate the notation and
the fact that the same graph "interpretation" (whether or not it
displaces the present graph model) provides for solutions to both
the data typing issues as well as several other critical (IMO) issues
such as reification and qualification.

> Would it be something like
>
> _Jenny x:weight urv:kg:decimal:12
>
> where urv:kg:decimal:12 denotes m1, or
>
> _Jenny x:weight _w
> _w     x:kg     urv:decimal:12
>
> where urv:decimal:12 denotes n2?

;-)  Here's where it gets fun...

OK, I've not yet published this, I'm working on the I-D this week
(along with I-D's for several other URI schemes based on the URV
concept) and it likely will be based on the Unified Code for
Units of Measure (http://aurora.rg.iupui.edu/~schadow/units/UCUM/)
or similar (I expect IEEE offers a few options ;-) 

Here's how I would encode the information in your example:

1. There is a URV scheme 'unit:' (with accompanying data type
   RDF classes) which is a data typing scheme for encoding
   magnitudes with units of measure.

   The basic form is 'unit:' {unit of measure} ':' {value}

   E.g. unit:kg:12  unit:cm:28818  unit:oz:11  unit:

   Now, a unit of measure is just another data type. In fact,
   we can define the data type represented by the URV scheme
   unit:kg to be a subclass of xsd:integer. The semantics of 
   'unit of measure' is defined for the unit:kg type, but being 
   a subClassOf xsd:integer, we know that all unit:kg forms
   map to values in the xsd:integer value space.

   Having a separate URV scheme for units of measure is just
   a convenience.

   Thus,

   <rdf:Description rdf:about="unit:kg">
      <lit:mapsTo rdf:resource="urn:unit:kg"/>
   </rdf:Description>


2. Define the RDF data type classes for units of measure and
   relate them to any other data type classes which they
   are subordinate to (e.g. ground them in the XML Schema
   data type taxonomy).

   I.e.

   <rdf:Description rdf:about="urn:unit:kg">
      <rdfs:subClassOf
rdf:resource="http://www.w3.org/2001/XMLSchema#integer"/>
   </rdf:Description>

3. Define your knowledge with literals encoded as 'unit:' URVs

   E.g. (using level 1 compression, all UNodes with same uriref's merged)

   [1,U,#Jenny]
   [2,U,#Robby]
   [3,U,#weight]
   [4,U,#age]
   [5,S,1,3,unit:kg:12]
   [6,S,2,3,unit:kg:14]
   [7,S,1,4,xsd:duration:1Y]
   [8,S,1,4,xsd:duration:12M]
   [9,S,2,4,xsd:duration:1Y2M]
   [10,S,1,4,xsd:duration:14M]

Now, the actual age values for Jenny and Robby weren't
clear from your graphic, so I had to guess a little,
but I think the gist should be clear.

The fact that the duration values can be parsed into
individual subvalue components and that those
subvalues correspond to integers are details of the
xsd:duration data type, and any application that understands
that data type will be able to interpret those values
accordingly. It is not RDF's responsibility IMO to provide
an explicit modelling of all complex data types which are
expressed in single lexical forms.

Folks may wish to do that, but it can't be RDF's responsibility
because RDF should be neutral to all data types (IMO).

The proper interpretation of lexical forms for a given
data type should not be part of the MT. Only the relation
of lexical form to data type (i.e. relation of RDF literal
to RDF class).

If RDF were to trully model the mapping from lexical form
into value space, it would have to have a canonical 
representation for values in a given value space, and that
canonical representation may or may not correlate to the
internalized representation of any given system, so all
that is really being done is defining a mapping from lexical
space to canonical lexical space -- since the RDF canonical
representation space is not an actual layer of any specific
system. RDF cannot escape lexical forms because it cannot 
offer a universal set of value spaces which are shared by
all RDF applications. Individual applications are still going
to have to map from the RDF encoded lexical forms into their
own value space representations.

Thus, trying to model the mapping of the literal "10" into
"the set of real integers" in the MT is not going to buy us
anything, as that mapping is the domain of the application
and thus we still remain in the realm of lexical forms, even
if they are canonical.

We may define upper-level type taxonomies which define only
value spaces and ground our data types that define lexical
spaces in those abstract classes via rdfs:subClassOf, and
use that as the basis of a common interpretation of values,
but to that end, RDF already then provides the interpretation
in the semantics of class relations. E.g.

   xsd:integer rdfs:subClassOf ieee:integer .
   xsd:integer rdfs:subClassOf ieee:integer .

The best we can do IMO is to model the relation of the literal
"10" to one or more data type classes (via rdf:type and rdfs:range) 
and the mechanisms by which the lexical space and value space are 
defined for the class (e.g. the 'lit:' ontology or similar,
or left implicit such as for XML Schema classes), and 
mechanisms for defining relationships between classes
(rdfs:subClassOf, rdfs:subPropertyOf), but not mechanisms for
"interpreting" them (defining the mapping from lexical space to
value space) in RDF.

If the above was not clear, please see the glossary
definitions about lexical space and canonical lexical space
and look at how those terms are used in the XML Schema spec
for defining simple types.

Cheers,

Patrick
Received on Wednesday, 14 November 2001 02:36:48 UTC