Comment on RDF datatypes draft (fwd)

Rick Jelliffe asked me to forward this to the list.

He didn't want to post directly given that this is a draft of WD and
clearly we are still busy discussing it internally and aren't really
expecting external comments.

Dave

------- Forwarded Message
From: Rick Jelliffe <ricko@topologi.com>
Subject: Comment on RDF datatypes draft
Date: Fri, 19 Apr 2002 01:35:05 +1000

While I like the draft, I think it tends to perpetuate XML Schema's
fuzzy approach to datatyping: that being that the lexical space
versus value space is trumpeted, but not exploited in such a way to
render it useable for many (most?) kinds of idiomatic data.

The problem can be exemplified like this:
  "How do I say 'This value is a US-format date'?"

In XML Schemas datatypes, we have a date value space (which I leave
experts to argue about.) But we only have a single lexical space which 
corresponds to ISO 8601 more or less: a format no non-geek uses.

This desire for a single lexical space (except in the case of boolean)
creates several problems:

 1) It hinders people who have data in some format already. For
example, people who want to make their DTDs RDF-compatible.

 2) It requires an extra layer of software to localize it: therefore
it is skewed against thin or simple clients and towards back-end
data interchange. Thus it is "internationalized" without allowing
"localization", which is ultimately always needed to become usable.

 3) It is conceptually weak, because it lumps all lexical values
together higgeldy piggeldy (sp), as if  "true" is the opposite of
"0".  

4) It only works when referring to data in XML: you cannot
type outside data, let alone provide type information about
binary data (say, embedded in XML as Bin64)
How could the RDF Datatypes proposal be strengthened to cover
these cases?

The lexical space needs to be compartmentalized into
nameable subspaces.  In the RDF Datatypes draft, there
is a notion of rdfd:lex  Lexical Form Idiom.  However,
this idea is already present in XML, XML Schemas and,
most importantly, ISO 8879 SGML:  it is called NOTATION.

In SGML, a NOTATION is a pysical/lexical form (perhaps even a binary
format) which has an implied type (which may be a structured
type).  Because NOTATION is a property of some resource or 
range in a resource, the idea of a value space without a lexical
space never really crops ups. So a NOTATION is, to all intents
and purposes, the name of a type. This seems pretty much the
the same as rdfd:lex, except that strengthens it to include non-text
formats. 

So XML is itself a notation. An XML document is a tree of notations,
just as much as it is a tree of elements and a tree of entities.  The
MIME ContentTypes such as plain/text are notations. Compression 
is a notation. Encoding in UTF-8 is, at another extreme, a notation.
So a particular physical document may not only contain multiple
nested notations, it may itself be transformed through various
notations to get to particular physical forms.  

"Lexical Space" is just a notation which uses Unicode. So 
the lexical space can be compartmented into particular notations, 
but there can be non-text notations too, forms for the same
value space. For example, that you should interpret a binary file
a list of integers serialized out to words in little-endian order is 
its notation. 

So "lexical space" is a particular range of notations.

So I suggest revising the datatypes draft:

 1) Substitute  rdfd:notation instead of  rdfd:lex  
 2) The "canonical lexical space" becomes the "canonical notation"
 3) The "canonical lexical mapping" becomes the "canonical namespace mapping"
 4) A datatyped literal is a triple
   <value space, notation, string>
 5) Notations can be named by URIs (as in XML Schemas & DTDS)
 6) redefine other things, such as the definition of range, to apply to
   particular notations, thus allowing the same lexical representation
   to map to different values in different notations for the same type.
   (Say we have a type which is an enumeration of abstract courses 
   in a meal: "entre" in US locale means "main course" while
   in EU locale it means "initial course".  Or,  3/2/01 will mean
   a different date depending on the locale.)
 7) ramifications worked through, to allow typing of non-XML and embedded
binary data

Cheers
Rick Jelliffe
www.topologi.com


------- End of Forwarded Message

Received on Thursday, 18 April 2002 11:33:27 UTC