Re: RDF's curious literals from Tim Berners-Lee on 2007-08-01 (semantic-web@w3.org from August 2007)

From: Tim Berners-Lee <timbl@w3.org>
Date: Wed, 1 Aug 2007 11:56:47 -0400
To: Garret Wilson <garret@globalmentor.com>
Cc: Sandro Hawke <sandro@w3.org>, Story Henry <henry.story@bblfish.net>, Semantic Web <semantic-web@w3.org>
Message-Id: <1465DCE8-C5D6-4723-9984-4475E14D6B63@w3.org>
On 2007-07 -31, at 22:01, Garret Wilson wrote:
>
> There seems to be a notion that things like the number 123 and the  
> boolean value true are some sort of different kind of resource,  
> merely because we have become accustomed to identifying these two  
> particular resources with strings rather than URIs. I find that  
> distinction to be completely arbitrary an unwarranted.

	Things and Terms

I think there is a confusion between the types of Things in the  
universe, and the types of terms in the RDF language. This is one I  
learned not to make not that long ago, but it is important.  A lot of  
my earlier coding got muddled.  Now I try to get it right.

First, Things in the universe.  (aka rdf:Resource)  Thing is the  
'top' class, the one everything is  a member of.   An infinite  
subclass of those things are Numbers and so on.  Numbers are abstract  
things we find very useful. They are very commonly referred to  in a  
lot of the data out there.

Now, Terms in the the language. I'll use N3, but other RDF-based  
languages go similarly.
Things are identified in the language by symbols. RDF languages use  
URIs as symbols.
Yes, we could have used specific URIs to identify the numbers, and we  
could indeed have had
a common shared space like  http://numers.info/int/123 as you suggest.
But, given that (a) it would have meant a lot of consensus-building  
to pick the URI suffixes
and that (b) very many computer languages used a specific syntax to  
refer to those Things which
are Numbers, including perl and python and SQL and XQ and so on, we  
went with the flow and
put in a syntax for numbers - ints, decimals, floats, and strings.  
( Weren't you the one earlier saying how important it was to use  
synatx people were used to?).

Now, in the SYNTAX,  the object of a statement can be a symbol using  
a URI  <http://...$foo>,, or a literal   123, or number
of shortcuts for URIs, such as prefixed names  cc:license  and so on.
When you look at the SYNTAX productions, literals and symbols will be  
quite distinct, as they are distinct productions, just as URIs and  
prefixed names.   That's syntax.   The class we could call  
N3URISymbol, N3PrefixedSymbol, N3Literal  do not overlap. For example  
their members begin with different characters.

This does not, though, affect the RDF model.    Different terms in  
the language an actually identify the same thing in the universe.   
This includes numbers.

	  ex:n owl:SameAs 123.

So are Numbers (say, or DateTimes, or Stings) a "different kind of  
Resource"?  Well, they do have certain properties which are  
particularly convenient.  Most of the information out there involves  
lots of them.   They end up occurring in different topologies in real  
data.    It is often more interesting to ask about all statements  
about  a person ex:Joe than all statements about the number 2, as 2  
gets reused so often.  This affects how a store might index them.   
But they are Things.   (In OWL DL there is a need for the sake of DL  
reasons to separate datatype properties and object properties, but  
that is an artifact of those reasoners and a limitation of OWL DL).
Numbers have the interesting property, for example, that, when you  
use the conventional notation terms for them, you can tell that two  
terms (like '123' and '124') identify different things by just  
operating on those terms.   Typically, people often read and write  
numbers, but in a good UI shouldn't have to read and write URIs,  
doing drag and drop of symbol-icons instead. There are lots of ways  
in which these Numbers, DateTimes, etc are special in practice, from  
arbitrary other Things in the universe.

	Uris and Strings

I think there was a similar confusion when you complained that  "a  
URI is just a string".
Well, a URI is a string in the universe.   But a symbol is not a  
literal in the language.
A symbol is the use of a URI string to stand for that which it  
identifies. In N3 this is denoted by <>.
A literal is a use in the language to stand fro a member of the class  
of Strings.

		<http://dbpedia.org/resource/Tim_Berners-Lee>  rdf:type foaf:Person.
		<http://dbpedia.org/resource/Tim_Berners-Lee>  ex:length   "1.75 m"

		"http://dbpedia.org/resource/Tim_Berners-Lee"   http:responseCode  
"303".
		"http://dbpedia.org/resource/Tim_Berners-Lee"   ex:length   "43  
chars".

		<http://dbpedia.org/resource/Tim_Berners-Lee> link:uri  "http:// 
dbpedia.org/resource/Tim_Berners-Lee"",
		<http://dbpedia.org/resource/Tim_Berners-Lee> link:uri  "http:// 
www.w3.org/People/Berners-Lee/card#i".

The first uses the string 'http://dbpedia.org/resource/Tim_Berners- 
Lee' to identify me. The third says that if you take that string and  
do a GET on it you will get back a response code of 303.  Note it  
applies to that string.  It does not apply to me.  I have other URIs.

A lot of the confusion in the recent threads on the semantic- 
web@w3.org list seems to have been connected
to the confusion between terms in the language and things in the  
universe.  I know we do tend to use the word 'literal' to refer to  
both the classes of numbers, etc etc, and also the production in the  
syntax.  We should probably find another term for one of them.

Hope this helps (and I didn't get it muddled!)

Tim
Received on Wednesday, 1 August 2007 15:56:59 UTC