RE: DATATYPING: second draft

From: <Patrick.Stickler@nokia.com>
Date: Fri, 14 Dec 2001 03:18:44 +0200
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B1B430F@trebe006.NOE.Nokia.com>
To: melnik@db.stanford.edu, w3c-rdfcore-wg@w3.org

> I restructured and extended the datatyping document to 
> include the most
> recent contributions by Graham, DanC, PatrickS and Frank:
>   http://www-db.stanford.edu/~melnik/rdf/datatyping/
> New stuff:
> - notion of datatyping scheme
> - vocabulary for datatype components: xsd:int.map, xsd:int.val etc.
> - extended examples of relating datatype components using RDF Schema
> - completed example for xsd:base64Binary
> - introduced Idioms A and B
> Let me know what you think...
> -- Sergey

Patrick's Comments:

1. Numbered sections would greatly facilitate referencing for comments ;-)

2. Scope, is it true that we also are not tasked with creating or adopting
mechanisms for defining data types -- only for capturing data type knowledge
about literals?

3. Desiderata for RDF Datatyping, bullet item 2, we're only talking about
simple data types here, right? After all, what does XML Schema have to
say about complex RDF structures? The only intersection that I see between
XML Schema and RDF structural models is lexically defined literals/data
as property values. We perhaps should make it clear that we are not
XML Schema to define/constrain/interpret RDF graph structures.

4. Type System, Datatype Mapping, para 4, @@@, I would argue that there need
not be a lexical representation for every value -- as this would preclude
being able to define "upper level" data types which serve as a means for
relating the value spaces of lexically grounded data types (for defining
intersections of value spaces only, where lexical spaces are incompatible).

5. Datatyping Schemes, we may wish to make clearer what we mean by
I assert that for values, we merely denote or uniquely identify the value,
but provide no actual representation for it in the graph -- otherwise, RDF
would have to have a built-in canonical representation for all values of
all data types that would be described in RDF. It is one thing to say
that ("10",xsd:integer) uniquely denotes the integer value ten, based on the
definition of datatype mapping, and quite another thing to have the actual
integer value ten in the graph itself.

6. Datatyping scheme "S", I've given my comments already with regards to
this scheme so I won't repeat them here, other than:

I view the S scheme as defining a pairing of lexical form (literal) with
data type (URI) and it is that pairing that "names" the mapping from the
member of the lexical space to the member of the value space of that
data type. This concept of pairing is IMO the common intersection between
all of the datatyping schemes -- and setting aside issues such as which
node denotes the actual value (if any) or whether literals are strings
or scalars, etc. they all appear to me as synonymous idioms which all
serve to define these pairings of lexical form and data type, from
which the single unique mapping can be inferred. And from that viewpoint,
it is simply a matter of which idiom(s) are the most optimal, intuitive,
efficient, and clear. While we should certainly "bless" specific idioms
to improve interoperability and portability, we may very well define
the data typing solution similarly to parseType and allow any idioms
which define these pairings, leaving users/vendors to choose which they
prefer. The key is simply getting to those pairings. Everything else
IMO falls out from that.

7. It would perhaps be useful to extract all of the discussion about the
specific schemes from this document until we are absolutely agreed upon
what it is we want/need to capture, and include the various scheme's
descriptions by reference. This is what I meant in my comments to Frank's
set of requirements that we should first focus on the abstraction of
data types before we look at the idioms used to capture data typing
in the graph. The data typing "model" adopted by RDF should be independent
of idiom. And we should be free to add idioms later as we need. This is
why I keep stressing the concept of a pairing as the basis of data typing
in RDF. It is both free from needing to provide an explicit representation
of the value, which would be necessary to define the actual mapping, yet
unambiguously identifies that mapping and thus captures all the knowledge
that an application needs. Idioms are secondary to that core concept of
the pairing itself, and we need at least two idioms to handle global and
local typing.


