W3C home > Mailing lists > Public > semantic-web@w3.org > August 2007

Re: RDF's curious literals

From: Garret Wilson <garret@globalmentor.com>
Date: Tue, 31 Jul 2007 19:01:44 -0700
Message-ID: <46AFE988.1040303@globalmentor.com>
To: Sandro Hawke <sandro@w3.org>
CC: Story Henry <henry.story@bblfish.net>, Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>

Sandro Hawke wrote:
> I'm not quite sure where you're going.  I think you're arguing about
> ways the RDF model could be different, and changes which would make it
> better.  That's okay, and sometime interesting.  At the same time, you
> seem to be arguing about what URIs and strings *are*, in the abstract
> (not related to RDF), and that's kind of confusing.  That's philosophy,
> not engineering, I think.

I'm sorry to have confused you. I assure you that the philosophical 
discussion was squarely in the "improving the RDF model" camp.

There seems to be a notion that things like the number 123 and the 
boolean value true are some sort of different kind of resource, merely 
because we have become accustomed to identifying these two particular 
resources with strings rather than URIs. I find that distinction to be 
completely arbitrary an unwarranted.

My discussion of URIs and strings was to point out that if URIs were 
invented earlier in the history of humans, we might all be accustomed to 
identifying 123 as the sequence of characters 
"http://example.org/numbers/123" instead of just "123". And I can 
guarantee you that, had this been the case, RDF would not have evolved a 
separate concept of "literals". But just because the sequence of 
characters with which we identify numbers is different doesn't mean that 
the concept of the value 123 is any different.

123 is a resource, just like anything else. If you want to settle on a 
common identifying URI, fine. But the concept of an RDF literals as a 
special type of resource must go.

> Your example here reminds me of some discussions (in July 2000) about
> how to define RDF literals.  At that point I was looking at using the
> "data" URI schema.
> This URI:
>      data:text/plain;charset=iso-8859-7,Hello%20World
> can perhaps be understood to denote the string "Hello World".

Fine. Let's agree on that URI for the string "Hello World" for the sake 
of argument. But let's not make the resource it identifies an instance 
of rdfs:Literal.

> The string literal we use to name those 11 characters is much shorter
> than the URI.  But I agree we could use either one.  As it turns out,
> the W3C Recommendation for RDF doesn't bother to use URIs to name
> strings; it just defines a literal syntax instead.

Fine. As I've said numerous times, if RDF/XML, N3, RDFON, or whatever 
other serialization format uses lexical forms to identify these 
resources, that's great. But after reading the data model, when I walk 
the resulting graph I don't want to have nodes be either literals or 
non-literals. I just want to check the rdf:type and see if it's an 
eg:USPresident or xsd:Integer. I can see which US president or integer 
it is by looking at the URI.

> Maybe that's less
> elegant.

Creating a new type of beast for something just because you're used to 
referring to it by a string rather than a URI makes a terribly inelegant 
model. What if I want to identify animals by the sounds they make? 
Should we introduce a special type rdfs:Audials, where the waveform 
"oink oink" refers to pigs? Something like (oinkoink.mp3)^^eg:animal 
(which says, in N3, "the resource for which makes an eg:animal noise of 
(play "oink oink" here))? I don't mind properties that specify the sound 
that a eg:Pig makes. I just don't want to use that as its identifier and 
make that resource an instance of an rdfs:Audial in the RDF model.

> Maybe it's more practical.  Maybe it's less practical.
> **shrug**    It's what we've got, and it's probably not worth changing.

Ah, and that's what this comes down to. My particular approach to life 
is this: just because someone else screwed it up in the past doesn't 
mean we shouldn't fix it now. (The environment, racial segregation, 
Darfur refugees, etc. come to mind.) I extend that attitude to the 
semantic web. If the entire Internet is going to run on this thing, 
shouldn't we fix it now?


P.S. When I mention that "someone else screwed up [literals] in the 
past," I do not mean to imply that creating something called "literals" 
was not the obvious or reasonable choice at the time based upon the 
then-current understanding of semantic modeling languages. Maybe it was; 
maybe it wasn't. It remains that, in hindsight, RDF currently has a 
screwed-up notion of literals that is an inconvenient anomaly. Based 
upon our current knowledge, we need to fix it.
Received on Wednesday, 1 August 2007 02:01:57 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:02 UTC