Re: RDF's curious literals from Richard Cyganiak on 2007-08-01 (semantic-web@w3.org from August 2007)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 1 Aug 2007 10:32:07 +0200
To: Garret Wilson <garret@globalmentor.com>
Cc: Story Henry <henry.story@bblfish.net>, Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-Id: <F878B0B9-409E-483E-BECE-48C4B5F983DE@cyganiak.de>
On 1 Aug 2007, at 01:49, Garret Wilson wrote:
> So let me state this another way: to say that non-literal resources  
> use URIs as identifiers and literal resources use strings as  
> identifiers is a false dichotomy. RDF uses strings for all its  
> identifiers. It's just that for non-literals, these strings conform  
> to a format called URI

That's simply not true.

     <http://dbpedia.org/wiki/George_W._Bush>

and

     "http://dbpedia.org/wiki/George_W._Bush"

do not identify the same resource. The first identifies a person, the  
43rd president of the U.S.. The second identifies a string of Unicode  
characters that happens to conform to the URI syntax.

Best,
Richard


> so as to reduce clashes. There's no reason why we can't have the  
> strings identifying literals conform to this same format as well by  
> pre/postpending the appropriate information--- perhaps a  
> "rdfliteral" prefix and a ";datatype" postfix. Then we use URI- 
> conforming strings for everything.
>
>> Still it is convenient to have literals, you have to admit.  
>> Because when you see one, you know how to deal with it  
>> immediately. And we are engineers, so we do like to have some  
>> conveniences.
>
> I have no idea what this statement means. I think it is convenient  
> to have integers. I think it is convenient to have strings. I think  
> it is convenient to have boolean values. When I see any of them, I  
> know how to deal with with them automatically. But what does this  
> have to do with literals?
>
> You may be saying, "when I see the strings '123', '\"123\"', and  
> 'true', I know instantly that these are an integer, a string, and a  
> boolean value." (Now you're back to JSON.) And that's fine---have  
> your processor automatically turn "123" into an integer, "\"123\""  
> into a string, and "true" into a boolean. But that doesn't mean  
> that I should get a special thing called a literal in my data  
> model. I should get three resources in my data model, perhaps  
> identified by URIs <rdfliteral:123;xsd:integer>, <rdfliteral: 
> 123;xsd:string>, and <rdfliteral:123;xsd:boolean>.
>
> If you want to have a serialization format and/or a library that  
> gives you special shortcuts for working with integers, strings,  
> booleans, or even US presidents, that's great. But none of those  
> shortcuts should affect the model.
>
>>
>>> But wait---if I decide that it's easier to represent this  
>>> resource using a string, I could create the resource "George W.  
>>> Bush"^^foaf:Person.
>>
>> Please have a closer look at N3, or else we will keep repeating  
>> the same points.
>
> Point well taken regarding the meaning of ^^ in N3...
>
> ...but please note that my point here is that there is no need for  
> rdf:datatype or some odd ^^ indirection (or some special  
> xsd:integer property that some anonymous subject has) if literals  
> are just resources. In the example above, I'm saying that I want an  
> rdf:type of foaf:Person for George W. Bush, and that identifying  
> him by a string shouldn't change how he appears in the graph.
>
>> Also to refer to people via a String is not helpful, because they  
>> can have different names. Since the
>> following is necessarily false
>>
>> "George W. Bush" = "George Bush" .
>
> But that's just one particular domain (which I could rectify by  
> using an rdf:datatype with a controlled lexical vocabulary for US  
> presidents), and it's missing my point. Let's talk about planets in  
> our solar system. I can identify two planets, <eg:Planet  
> rdf:about="http://example.com/planets/mars"/> and <eg:Planet  
> rdf:about="http://example.com/planets/uranus"/>, and  
> "Mars"^^eg:planet and "Uranus"^^eg:planet. These both identify the  
> same two planets, but why is Uranus (hee hee) a normal resource in  
> one and a literal in the other? Why does the first form have an  
> rdf:type and the second form have some sort of odd indirect- 
> reflexive rdf:datatype? This shouldn't be the case.
>
>> That is ok. A person can have two name relations different  
>> strings.  URIs are Universal Names.
>
> Every ordered pair (lexical form, rdf:datatype URI) is a Universal  
> Names, too, just as much as a URI is.
>
> Let me state the whole case differently: URIs don't clash because  
> they have some sort of domain specifier (hey, it's even called  
> domain---fancy that!) prepended to the string (e.g. "http:// 
> example.org/" prepended to "string"). RDF typed literals don't  
> clash because they have a *separate* string (an rdf:datatype URI)  
> that is a domain specifier for the string. So there is no  
> difference between URI domain+string and rdf:datatype+lexical form.  
> So why not combine the lexical form and the datatype, resulting in  
> a URI, and bring literals back into the resource fold?
>
> Garret
>
>
Received on Wednesday, 1 August 2007 08:32:55 UTC