Re: RDF's curious literals from Jeremy Carroll on 2007-08-02 (semantic-web@w3.org from August 2007)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Thu, 02 Aug 2007 20:36:04 +0100
To: Story Henry <henry.story@bblfish.net>
CC: Sandro Hawke <sandro@w3.org>, Lee Feigenbaum <lee@thefigtrees.net>, Richard Cyganiak <richard@cyganiak.de>, Garret Wilson <garret@globalmentor.com>, Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <46B23224.10401@hpl.hp.com>

At some level this thread is rather futile.
The RDF design includes a design for representing numbers, amongst other 
things.
This is now fairly well deployed with interoperable implementations.

Garret doesn't like this aspect of the design.

Well, that's life.

All aspects of agreements between numerous people involve aspects that 
some people dislike. It is particular irksome when, for some reason, we 
end up participating in an aspect of the world which other people agreed 
on, and we are too late to the party to argue against something we don't 
like.

I think it may be less futile to give some sort of design rationale.

There are two approaches:
- give an historical account of how we got to where we are
- give a more abstract account of the problem space, and see which 
aspects of the current design are essentially inevitable.

I'll try the latter - the former is available in the mail archives of 
the RDF Core WG.

=====

RDF is intended as a way of describing things.

Most of the things being described, and the means to describe them, are 
identified by URIs. However, URIs are non-rigid designators, i.e. it is 
not always clear what a URI is intended to represent. The RDF Semantics 
is written with the weakest possible assumption that each URI represents 
something, but we don't know what.

It is also helpful to have some aspects of the descriptions using rigid 
designators, where what they represent is known in advance. In RDF these 
things are called literals. Initially the only sort of literals were 
strings. This was fairly limiting, and there was a desire to include 
other datatypes, such as those defined by XML Schema

Given that we wanted to have an open framework, which wasn't limited to 
just the XML Schema datatypes, we decided that the author of an RDF 
document could use whatever datatype they wanted; although we did not 
define a means by which they could declare new datatypes, but require 
private agreement for new datatypes. If there was a call to fix this, it 
could be done.

To allow anyone to introduce there own datatypes we used the notion of a 
datatype URI to identify the datatype being used. I think this is highly 
defensible design decision.

Since the point of having literals is to have things whose 
interpretation is known, the datatype acts as the means by which that 
interpretation is defined. Hence a datatype has a lexical-to-value mapping.

To provide a useful set of datatypes, we use the XML Schema datatypes, 
identified by the URIs given by the XML Schema WG.

As many people have pointed out the abstract syntax is an abstract 
syntax. It is not intended to limit the way that RDF is written down, 
nor is it intended as the meaning of an RDF document. Thus in the 
abstract syntax a typed literal is represented as a pair: the datatype 
URI and a string. In RDF Semantics this is then mapped to the specific 
value as given by the datatype. Having such predefined designators is a 
fundamental requirement for being able to use known values in 
descriptions of resources, which was one of the goals of the literal design.

Moreover any design which allows arbitrary user defined datatypes ends 
up needing something like a URI to represent the datatype, and something 
like the lexical form to represent the string representation of that 
value: at least at the abstract syntax level. You are free to write that 
pair however you like, including omitting the datatype URI and the 
quotes around the string, as long as in the syntax you are using they 
are superfluous, and then they can be (logically) put back into the 
abstract syntax.

====

There were other design options we considered, but they all included the 
notion of a datatype URI and the notion of a lexical form, the notion of 
a lexical to value mapping, and the value space.

Garret's proposed design also seems to include these - except that the 
datatype URI is used as URI prefix, and the lexical form is used as a 
suffix. This seems to require analysis of the internals of a URI in 
order to identify what it means, and I prefer the designs where these 
components are separated.

Jeremy









-- 
Hewlett-Packard Limited
registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England

Received on Thursday, 2 August 2007 19:36:28 UTC