Re: RDF's curious literals from Garret Wilson on 2007-07-31 (semantic-web@w3.org from July 2007)

From: Garret Wilson <garret@globalmentor.com>
Date: Tue, 31 Jul 2007 16:49:05 -0700
To: Story Henry <henry.story@bblfish.net>
CC: Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <46AFCA71.6020008@globalmentor.com>
Story Henry wrote:
>
> I think the point of Tim's post was that really he thinks that the 
> only things that are literals are strings.
>
> [] xsd:integer "123" .
>
> can be written in shorthand as
>
> "123"^xsd:integer .
>
> (see the N3 tutorial)
>
> Because everybody in the xml world was going crazy about xsd datatypes 
> they wanted this more complex notion of literals. To satisfy them the 
> ^^ notation was added.
>
> 123 = "123"^^xsd:integer,
>       "123"^xsd:integer
>       123 .
>
> is a nice shortcut in N3.

Ah, I see---and sorry for just getting up to speed on N3. So that's why 
you kept using xsd:integer the way you did, and I kept objecting to it. 
I wanted to use xsd:integer as the type of the resource (and I still 
do). You were wanting to say, "if we consider xsd:integer to mean 
something like Integer.toString() in Java, and then we can refer to the 
resource 123 by saying, "the resource for which Integer.toString() 
yields '123'."

But regardless of this notation, I still stick to all my original points.

>
> You may have a good point there. There is a difference between a 
> string and a URI though that is worth keeping in mind.

Related to this discussion, there is only one difference between a URI 
string and a non-URI string: a URI string has an internal structure that 
allows you to apportion of sections to be managed by a third party (in 
this case, ICANN and the IETF) so that you don't have name clashes. Note 
that you can still have name clashes when people decide to run parallel 
DNSs---it just doesn't happen that often. And you still have semantic 
clashes when you have non-normalized forms of UTF-8 encoded Unicode 
being used.

But overall, URIs help prevent string clashes. And that's the only 
difference. If we decided that only people with last name starting with 
the letter B could manage strings starting with the letter B, we would 
have an analogous situation (although there would be more clashes).

The string "George W. Bush" when identifying the US president can clash 
with the same string if someone decided to name their pet pig "George W. 
Bush". But if I prepend the string with 
"http://example.org/us/presidents/", I reduce clashes and you're happy. 
But I've simply prepended the string with something to reduce the 
environments in which "George W. Bush" can refer to different things. 
But it's still a string---it just conforms to the URI syntax and has 
extra preceding characters.

So "313" is a string, and I can use it to identify the value 313. But I 
can also use it to identify the characters on the license plate of 
Donald Duck's car. So to prevent the confusion, I could prepend 
characters to make longer strings: "(this is a string dammit)313" and 
"(this is a number danmmit)313". And we don't have clashes anymore. But 
that's certainly a nonstandard syntax. I could use a standard format of 
a string called URI (hey, RDF already uses URIs for all other resources, 
so that comes in handy!) and use perhaps <rdfliteral:313;xsd:string> and 
<rdfliteral:313;xsd:literal>. But I'm still using strings---just strings 
with structure.

So let me state this another way: to say that non-literal resources use 
URIs as identifiers and literal resources use strings as identifiers is 
a false dichotomy. RDF uses strings for all its identifiers. It's just 
that for non-literals, these strings conform to a format called URI so 
as to reduce clashes. There's no reason why we can't have the strings 
identifying literals conform to this same format as well by 
pre/postpending the appropriate information--- perhaps a "rdfliteral" 
prefix and a ";datatype" postfix. Then we use URI-conforming strings for 
everything.

> Still it is convenient to have literals, you have to admit. Because 
> when you see one, you know how to deal with it immediately. And we are 
> engineers, so we do like to have some conveniences.

I have no idea what this statement means. I think it is convenient to 
have integers. I think it is convenient to have strings. I think it is 
convenient to have boolean values. When I see any of them, I know how to 
deal with with them automatically. But what does this have to do with 
literals?

You may be saying, "when I see the strings '123', '\"123\"', and 'true', 
I know instantly that these are an integer, a string, and a boolean 
value." (Now you're back to JSON.) And that's fine---have your processor 
automatically turn "123" into an integer, "\"123\"" into a string, and 
"true" into a boolean. But that doesn't mean that I should get a special 
thing called a literal in my data model. I should get three resources in 
my data model, perhaps identified by URIs <rdfliteral:123;xsd:integer>, 
<rdfliteral:123;xsd:string>, and <rdfliteral:123;xsd:boolean>.

If you want to have a serialization format and/or a library that gives 
you special shortcuts for working with integers, strings, booleans, or 
even US presidents, that's great. But none of those shortcuts should 
affect the model.

>
>> But wait---if I decide that it's easier to represent this resource 
>> using a string, I could create the resource "George W. 
>> Bush"^^foaf:Person.
>
> Please have a closer look at N3, or else we will keep repeating the 
> same points.

Point well taken regarding the meaning of ^^ in N3...

...but please note that my point here is that there is no need for 
rdf:datatype or some odd ^^ indirection (or some special xsd:integer 
property that some anonymous subject has) if literals are just 
resources. In the example above, I'm saying that I want an rdf:type of 
foaf:Person for George W. Bush, and that identifying him by a string 
shouldn't change how he appears in the graph.

> Also to refer to people via a String is not helpful, because they can 
> have different names. Since the
> following is necessarily false
>
> "George W. Bush" = "George Bush" .

But that's just one particular domain (which I could rectify by using an 
rdf:datatype with a controlled lexical vocabulary for US presidents), 
and it's missing my point. Let's talk about planets in our solar system. 
I can identify two planets, <eg:Planet 
rdf:about="http://example.com/planets/mars"/> and <eg:Planet 
rdf:about="http://example.com/planets/uranus"/>, and "Mars"^^eg:planet 
and "Uranus"^^eg:planet. These both identify the same two planets, but 
why is Uranus (hee hee) a normal resource in one and a literal in the 
other? Why does the first form have an rdf:type and the second form have 
some sort of odd indirect-reflexive rdf:datatype? This shouldn't be the 
case.

> That is ok. A person can have two name relations different strings.  
> URIs are Universal Names.

Every ordered pair (lexical form, rdf:datatype URI) is a Universal 
Names, too, just as much as a URI is.

Let me state the whole case differently: URIs don't clash because they 
have some sort of domain specifier (hey, it's even called domain---fancy 
that!) prepended to the string (e.g. "http://example.org/" prepended to 
"string"). RDF typed literals don't clash because they have a *separate* 
string (an rdf:datatype URI) that is a domain specifier for the string. 
So there is no difference between URI domain+string and 
rdf:datatype+lexical form. So why not combine the lexical form and the 
datatype, resulting in a URI, and bring literals back into the resource 
fold?

Garret
Received on Tuesday, 31 July 2007 23:49:18 UTC