- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Thu, 02 Aug 2007 20:36:04 +0100
- To: Story Henry <henry.story@bblfish.net>
- CC: Sandro Hawke <sandro@w3.org>, Lee Feigenbaum <lee@thefigtrees.net>, Richard Cyganiak <richard@cyganiak.de>, Garret Wilson <garret@globalmentor.com>, Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
At some level this thread is rather futile. The RDF design includes a design for representing numbers, amongst other things. This is now fairly well deployed with interoperable implementations. Garret doesn't like this aspect of the design. Well, that's life. All aspects of agreements between numerous people involve aspects that some people dislike. It is particular irksome when, for some reason, we end up participating in an aspect of the world which other people agreed on, and we are too late to the party to argue against something we don't like. I think it may be less futile to give some sort of design rationale. There are two approaches: - give an historical account of how we got to where we are - give a more abstract account of the problem space, and see which aspects of the current design are essentially inevitable. I'll try the latter - the former is available in the mail archives of the RDF Core WG. ===== RDF is intended as a way of describing things. Most of the things being described, and the means to describe them, are identified by URIs. However, URIs are non-rigid designators, i.e. it is not always clear what a URI is intended to represent. The RDF Semantics is written with the weakest possible assumption that each URI represents something, but we don't know what. It is also helpful to have some aspects of the descriptions using rigid designators, where what they represent is known in advance. In RDF these things are called literals. Initially the only sort of literals were strings. This was fairly limiting, and there was a desire to include other datatypes, such as those defined by XML Schema Given that we wanted to have an open framework, which wasn't limited to just the XML Schema datatypes, we decided that the author of an RDF document could use whatever datatype they wanted; although we did not define a means by which they could declare new datatypes, but require private agreement for new datatypes. If there was a call to fix this, it could be done. To allow anyone to introduce there own datatypes we used the notion of a datatype URI to identify the datatype being used. I think this is highly defensible design decision. Since the point of having literals is to have things whose interpretation is known, the datatype acts as the means by which that interpretation is defined. Hence a datatype has a lexical-to-value mapping. To provide a useful set of datatypes, we use the XML Schema datatypes, identified by the URIs given by the XML Schema WG. As many people have pointed out the abstract syntax is an abstract syntax. It is not intended to limit the way that RDF is written down, nor is it intended as the meaning of an RDF document. Thus in the abstract syntax a typed literal is represented as a pair: the datatype URI and a string. In RDF Semantics this is then mapped to the specific value as given by the datatype. Having such predefined designators is a fundamental requirement for being able to use known values in descriptions of resources, which was one of the goals of the literal design. Moreover any design which allows arbitrary user defined datatypes ends up needing something like a URI to represent the datatype, and something like the lexical form to represent the string representation of that value: at least at the abstract syntax level. You are free to write that pair however you like, including omitting the datatype URI and the quotes around the string, as long as in the syntax you are using they are superfluous, and then they can be (logically) put back into the abstract syntax. ==== There were other design options we considered, but they all included the notion of a datatype URI and the notion of a lexical form, the notion of a lexical to value mapping, and the value space. Garret's proposed design also seems to include these - except that the datatype URI is used as URI prefix, and the lexical form is used as a suffix. This seems to require analysis of the internals of a URI in order to identify what it means, and I prefer the designs where these components are separated. Jeremy -- Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England
Received on Thursday, 2 August 2007 19:36:28 UTC