Re: RDF's curious literals from Jeremy Carroll on 2007-08-01 (semantic-web@w3.org from August 2007)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Wed, 01 Aug 2007 10:25:43 +0100
To: Garret Wilson <garret@globalmentor.com>
CC: Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <46B05197.9020701@hpl.hp.com>
Your position is very curious.

Yes, of course, literals are just another sort of resource.
And yes the design decision to give them a privileged position (i.e. not 
requiring the use of the URI namespace), can be questioned.

But the basic rationale for having the class rdfs:Literal and some of 
the key subclasses such as strings, lang-strings, xsd:integer etc. is 
based on engineering utility. Almost all SW applications need to use 
strings, and natural-language strings, and integers etc. Thus it is 
convenient to standardize on a representation for these. And as with any 
standards work integrating with previous standard work is non-trivial 
and the results are always a bit of a compromise, design by committee 
etc. etc. Nevertheless, having a standard for things that all such 
applications want to do is useful.

Jeremy


Garret Wilson wrote:
> 
> Tim,
> 
> Thanks for the reply on RDFON. I accept that the RDF proposal was 
> ignorant of the latest N3 syntax, and although I still prefer something 
> RDFON-like, your points were valid and there's no merit in my trying to 
> advance RDFON as "better" at this stage (or perhaps at any stage). And 
> just as I have only recently been made aware that RDF (despite RDF/XML) 
> allows literals in lists, your note that RDF allows literals to have 
> properties (also in spite of RDF/XML limitations) also came as a 
> surprise to me. RDF/XML is surely one of the worst things to happen to RDF.
> 
> The other is literals. Before replying to your comments below, let me 
> just step back and make a few observations and ask a few questions 
> concerning literals in RDF. (By the way, when I say "you", please 
> understand that I'm speaking to a hypothetical responder or the general 
> RDF user, not necessarily you (Tim), or anyone else.)
> 
> 1. How is a literal any different than a resource? The RDFS definition 
> at http://www.w3.org/TR/rdf-schema/#ch_literal is at first circular 
> ("the class of literal values") and then nonexistent for some literals 
> ("This specification does not define the class of plain literals."). The 
> RDF Primer explanation at 
> http://www.w3.org/TR/rdf-concepts/#section-Literals is more helpful: A 
> literal is a resource identified by a lexical representation, which 
> representation may be "more convenient or intuitive" to use instead of a 
> URI.
> 
> So at the end of the day, a literal is simply a resource that is easy to 
> refer to using a string of characters. That's all well and good, but why 
> should that affect my model? Why is a resource an instance of another 
> class (rdfs:Literal) just because I like to identify it by a lexical 
> representation?
> 
> Take for example the resource identified by the URI 
> <http://example.org/presidents/GeorgeWBush>. This resource may have an 
> rdf:type of foaf:Person. (I could assert all sorts of other RDF 
> statements about this resource, but will decline to do so at this time.) 
> Is this resource an instance of the class rdfs:Literal? No? Why not?
> 
> But wait---if I decide that it's easier to represent this resource using 
> a string, I could create the resource "George W. Bush"^^foaf:Person. 
> Suddenly George W. Bush (the person, without the quotes, just as 123 is 
> the resource represented by "123"^^xsd:integer) is an instance of 
> rdfs:Literal. Why? Why did my model change? How did the world I was 
> modeling change just because I decided to represent George W. Bush using 
> a string?
> 
> So let me go back to my original question: How is a literal different 
> from a resource? My answer is that there should be *no* difference. The 
> only difference is a syntactical matter of identification---but that 
> should *not* give rise to a new class of resources. There should be no 
> such thing as an rdfs:Literal. Everything should be a resource, however 
> we decide to identify them. (Think of how absurd it would be to have an 
> rdfs:Anonymous class, for all resources that are identified neither by a 
> URI nor a lexical representation!)
> 
> 2. How is rdf:type different from rdf:datatype? This is where RDF's odd 
> treatment of literals starts to get stranger. If I describe the resource 
> identified by URI <http://example.org/presidents/GeorgeWBush>, I can 
> give this resource an rdf:type of foaf:Person. But if I describe this 
> same resource using a lexical representation, I give it an rdf:datatype 
> of foaf:Person (yielding "George W. Bush"^^foaf:Person). Why? It's the 
> same resource---I just found it "more convenient" to identify it with a 
> string.
> 
> The same thing goes for the resource 123, identified by the string "123" 
> with data type xsd:integer. This resource should have rdf:type 
> xsd:Integer. Why does it have a separate xsd:datatype? One answer could 
> be that "rdf:datatype is to specify the transformation between the 
> lexical representation and the actual resource." Fine, but that has two 
> problems: the rdf:datatype sticks around in the actual model, when its 
> user is merely syntactic; and I still don't get an rdf:type, which the 
> number 123 surely has (just like George W. Bush surely has an rdf:type 
> of foaf:Person, even if I refer to him using a lexical representation).
> 
> There should be no rdf:datatype. Its usage is partly syntactic; the 
> other part is made redundant by rdf:type.
> 
> 3. If you want to refer to a resource using a lexical representation, 
> RDF should create a URI scheme for lexical representations---then we 
> could simply refer to all literals by URIs and be done with it. One 
> method would be to use the form <rdfliteral:literal;datatype>, such as 
> <rdfliteral:123;xsd:Integer>. I frankly don't care what the format of 
> this URI is, but the mapping is straightforward. A URI is a glorified 
> string---there's no reason to use *other* strings to identify resources.
> 
> "But I want to simply use a string, not a URI, in my serialization of 
> choice," you say. Fine, but that's a serialization issue. If you want to 
> use "123"^^xsd:integer in N3 and have your parser automatically generate 
> a node with URI <rdfliteral:123;xsd:integer>, then so be it. But there's 
> no reason to have a different type of resource created just because you 
> like to use string shortcuts, and there's no need to query these beasts 
> differently just because you like writing "123"^^xsd:integer instead of 
> <rdfliteral:123;xsd:integer>.
> 
> I'm all for syntactical shortcuts. In fact, I would make it even easier 
> for you: I think if you write "abc", the processor should automatically 
> change this to <rdfliteral:abc;xsd:string> for you. But that shouldn't 
> change the model or make some sort of odd literal class. They are all 
> resources goshdarnit! All!
> 
> So let me add a few quick responses below to clear up a last few things:
> 
> Tim Berners-Lee wrote:
>> I agree that thinking of an integer as a Resource is fine, in that 123 
>> is a Thing, like everyThing else.
> 
> It's more than just "thinking of an integer as a Resource". An integer 
> is a resource, no? How is George W. Bush more of a resource than the 
> number 3?
> 
>>
>> That does not mean we should symbols and literal values in the language.
> 
> I think you left out a "not" or something, but let me restate this: 
> "That does not mean we should not use symbols and literal values in your 
> serialization language of choice. But it shouldn't change the RDF model."
> 
> 
>> I think it is fine to have 123 (note no quotes) as literal in n3, 
>> which it is.
> 
> I think it is fine to have 123 as a resource. It shouldn't be a literal. 
> So I can represent it as "123" in English, or "١٢٣" in Urdu. Big deal. I 
> can represent George W. Bush as "George W. Bush" in English. Nothing 
> about these true statements changes the type of resources we're dealing 
> with.
> 
> 
>> I think it fine to say that that sequence of character sin the 
>> labguage a identifies the number 123, which is a member of the class 
>> of Integers, much as a URI identifies another reseource.
> 
> Right. That's a statement of syntactic transformation. Let's keep it 
> down in the serialization, not in the model.
> 
> 
>> I think in fact also its fine to make URIs and say they also represent 
>> the number 123, e.g.
> 
> I agree with that statement on its face.
> 
>> I don't, however, think it works to have rdf:about as a single 
>> property (or even XML attribute) relating
>> 123 to the string "123".
> 
> Here's where I was misunderstood. I made all those eg:IntRepresentation 
> examples in another message to illustrate that the lexical 
> representation is distinct from the resource itself. I'm violently 
> agreeing here: I don't want to relate 123 with "123" at all, except for 
> using "123" in your serialization of choice to somehow get to the 
> resource 123 if you like.
> 
> 
>> For example, suppose we want to model octal numbers and decimal numbers.
>> I much prefer to concentrate on the number 123 as an Integer, and have 
>> separate properties decmal and octal
>> relating it to different strings, than to imagine separate classes of 
>> Decimal Integer and Octal Integer.
> 
> Completely, completely agreed.
> 
>>
>>> And (finally) going back to
>>> RDFON, we see that eg:datatype("value") is really just
>>> instantiating an eg:datatype class with a lexical identifier
>>> instead of a URI identifier.
>>
>> If you look at that as an object initialization function, then that 
>> maps to a binary predicate which is my model above. I prefer very much 
>> to have a datatype-specific one such as dt:decimal.
> 
> I don't quite understand what you're saying here. In RDFON, I would have 
> xsd:Integer("123") map to:
> 
> <rdf:literal:123;xsd:Integer> rdf:type xsd:Integer
> 
>>
>> One more note on datatypes. In practice the term in the RDF abstract 
>> language which N3 writes as 123 and NTriples writes as 
>> 123^^xsd:integer I model as [ xsd:integer 2] or 2^xsd:decimal, in 
>> practice is stored in RDF stores typically as some object like
>>
>> {termType: 'literal', value: "123", dt_URI: "http://...integer", lang: 
>> null }
>>
>> This is a term in the language. It isn't the resource 123.
> 
> But 123 was the resource I was trying to identify. Why doesn't it map to 
> the triple I show above ( <rdf:literal:123;xsd:Integer> rdf:type 
> xsd:Integer )? Isn't that simpler? Doesn't that reflect what I 
> indicated? Doesn't it identify a resource that I can give properties to 
> and put into lists---even in RDF/XML?
> 
> Death to literals, rdfs:Literal, and rdf:datatype. Long live resources.
> 
> Garret
> 

-- 
Hewlett-Packard Limited
registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Wednesday, 1 August 2007 09:26:18 UTC