RDF's curious literals (was: Re: RDFON: a new RDF serialization) from Garret Wilson on 2007-07-31 (semantic-web@w3.org from July 2007)

From: Garret Wilson <garret@globalmentor.com>
Date: Tue, 31 Jul 2007 14:44:06 -0700
To: Tim Berners-Lee <timbl@w3.org>
CC: Semantic Web <semantic-web@w3.org>
Message-ID: <46AFAD26.8060906@globalmentor.com>
Tim,

Thanks for the reply on RDFON. I accept that the RDF proposal was 
ignorant of the latest N3 syntax, and although I still prefer something 
RDFON-like, your points were valid and there's no merit in my trying to 
advance RDFON as "better" at this stage (or perhaps at any stage). And 
just as I have only recently been made aware that RDF (despite RDF/XML) 
allows literals in lists, your note that RDF allows literals to have 
properties (also in spite of RDF/XML limitations) also came as a 
surprise to me. RDF/XML is surely one of the worst things to happen to RDF.

The other is literals. Before replying to your comments below, let me 
just step back and make a few observations and ask a few questions 
concerning literals in RDF. (By the way, when I say "you", please 
understand that I'm speaking to a hypothetical responder or the general 
RDF user, not necessarily you (Tim), or anyone else.)

1. How is a literal any different than a resource? The RDFS definition 
at http://www.w3.org/TR/rdf-schema/#ch_literal is at first circular 
("the class of literal values") and then nonexistent for some literals 
("This specification does not define the class of plain literals."). The 
RDF Primer explanation at 
http://www.w3.org/TR/rdf-concepts/#section-Literals is more helpful: A 
literal is a resource identified by a lexical representation, which 
representation may be "more convenient or intuitive" to use instead of a 
URI.

So at the end of the day, a literal is simply a resource that is easy to 
refer to using a string of characters. That's all well and good, but why 
should that affect my model? Why is a resource an instance of another 
class (rdfs:Literal) just because I like to identify it by a lexical 
representation?

Take for example the resource identified by the URI 
<http://example.org/presidents/GeorgeWBush>. This resource may have an 
rdf:type of foaf:Person. (I could assert all sorts of other RDF 
statements about this resource, but will decline to do so at this time.) 
Is this resource an instance of the class rdfs:Literal? No? Why not?

But wait---if I decide that it's easier to represent this resource using 
a string, I could create the resource "George W. Bush"^^foaf:Person. 
Suddenly George W. Bush (the person, without the quotes, just as 123 is 
the resource represented by "123"^^xsd:integer) is an instance of 
rdfs:Literal. Why? Why did my model change? How did the world I was 
modeling change just because I decided to represent George W. Bush using 
a string?

So let me go back to my original question: How is a literal different 
from a resource? My answer is that there should be *no* difference. The 
only difference is a syntactical matter of identification---but that 
should *not* give rise to a new class of resources. There should be no 
such thing as an rdfs:Literal. Everything should be a resource, however 
we decide to identify them. (Think of how absurd it would be to have an 
rdfs:Anonymous class, for all resources that are identified neither by a 
URI nor a lexical representation!)

2. How is rdf:type different from rdf:datatype? This is where RDF's odd 
treatment of literals starts to get stranger. If I describe the resource 
identified by URI <http://example.org/presidents/GeorgeWBush>, I can 
give this resource an rdf:type of foaf:Person. But if I describe this 
same resource using a lexical representation, I give it an rdf:datatype 
of foaf:Person (yielding "George W. Bush"^^foaf:Person). Why? It's the 
same resource---I just found it "more convenient" to identify it with a 
string.

The same thing goes for the resource 123, identified by the string "123" 
with data type xsd:integer. This resource should have rdf:type 
xsd:Integer. Why does it have a separate xsd:datatype? One answer could 
be that "rdf:datatype is to specify the transformation between the 
lexical representation and the actual resource." Fine, but that has two 
problems: the rdf:datatype sticks around in the actual model, when its 
user is merely syntactic; and I still don't get an rdf:type, which the 
number 123 surely has (just like George W. Bush surely has an rdf:type 
of foaf:Person, even if I refer to him using a lexical representation).

There should be no rdf:datatype. Its usage is partly syntactic; the 
other part is made redundant by rdf:type.

3. If you want to refer to a resource using a lexical representation, 
RDF should create a URI scheme for lexical representations---then we 
could simply refer to all literals by URIs and be done with it. One 
method would be to use the form <rdfliteral:literal;datatype>, such as 
<rdfliteral:123;xsd:Integer>. I frankly don't care what the format of 
this URI is, but the mapping is straightforward. A URI is a glorified 
string---there's no reason to use *other* strings to identify resources.

"But I want to simply use a string, not a URI, in my serialization of 
choice," you say. Fine, but that's a serialization issue. If you want to 
use "123"^^xsd:integer in N3 and have your parser automatically generate 
a node with URI <rdfliteral:123;xsd:integer>, then so be it. But there's 
no reason to have a different type of resource created just because you 
like to use string shortcuts, and there's no need to query these beasts 
differently just because you like writing "123"^^xsd:integer instead of 
<rdfliteral:123;xsd:integer>.

I'm all for syntactical shortcuts. In fact, I would make it even easier 
for you: I think if you write "abc", the processor should automatically 
change this to <rdfliteral:abc;xsd:string> for you. But that shouldn't 
change the model or make some sort of odd literal class. They are all 
resources goshdarnit! All!

So let me add a few quick responses below to clear up a last few things:

Tim Berners-Lee wrote:
> I agree that thinking of an integer as a Resource is fine, in that 123 
> is a Thing, like everyThing else.

It's more than just "thinking of an integer as a Resource". An integer 
is a resource, no? How is George W. Bush more of a resource than the 
number 3?

>
> That does not mean we should symbols and literal values in the language.

I think you left out a "not" or something, but let me restate this: 
"That does not mean we should not use symbols and literal values in your 
serialization language of choice. But it shouldn't change the RDF model."


> I think it is fine to have 123 (note no quotes) as literal in n3, 
> which it is.

I think it is fine to have 123 as a resource. It shouldn't be a literal. 
So I can represent it as "123" in English, or "١٢٣" in Urdu. Big deal. I 
can represent George W. Bush as "George W. Bush" in English. Nothing 
about these true statements changes the type of resources we're dealing 
with.


> I think it fine to say that that sequence of character sin the 
> labguage a identifies the number 123, which is a member of the class 
> of Integers, much as a URI identifies another reseource.

Right. That's a statement of syntactic transformation. Let's keep it 
down in the serialization, not in the model.


> I think in fact also its fine to make URIs and say they also represent 
> the number 123, e.g.

I agree with that statement on its face.

> I don't, however, think it works to have rdf:about as a single 
> property (or even XML attribute) relating
> 123 to the string "123".

Here's where I was misunderstood. I made all those eg:IntRepresentation 
examples in another message to illustrate that the lexical 
representation is distinct from the resource itself. I'm violently 
agreeing here: I don't want to relate 123 with "123" at all, except for 
using "123" in your serialization of choice to somehow get to the 
resource 123 if you like.


> For example, suppose we want to model octal numbers and decimal numbers.
> I much prefer to concentrate on the number 123 as an Integer, and have 
> separate properties decmal and octal
> relating it to different strings, than to imagine separate classes of 
> Decimal Integer and Octal Integer.

Completely, completely agreed.

>
>> And (finally) going back to
>> RDFON, we see that eg:datatype("value") is really just
>> instantiating an eg:datatype class with a lexical identifier
>> instead of a URI identifier.
>
> If you look at that as an object initialization function, then that 
> maps to a binary predicate which is my model above. I prefer very much 
> to have a datatype-specific one such as dt:decimal.

I don't quite understand what you're saying here. In RDFON, I would have 
xsd:Integer("123") map to:

<rdf:literal:123;xsd:Integer> rdf:type xsd:Integer

>
> One more note on datatypes. In practice the term in the RDF abstract 
> language which N3 writes as 123 and NTriples writes as 
> 123^^xsd:integer I model as [ xsd:integer 2] or 2^xsd:decimal, in 
> practice is stored in RDF stores typically as some object like
>
> {termType: 'literal', value: "123", dt_URI: "http://...integer", lang: 
> null }
>
> This is a term in the language. It isn't the resource 123.

But 123 was the resource I was trying to identify. Why doesn't it map to 
the triple I show above ( <rdf:literal:123;xsd:Integer> rdf:type 
xsd:Integer )? Isn't that simpler? Doesn't that reflect what I 
indicated? Doesn't it identify a resource that I can give properties to 
and put into lists---even in RDF/XML?

Death to literals, rdfs:Literal, and rdf:datatype. Long live resources.

Garret
Received on Tuesday, 31 July 2007 21:44:12 UTC