Re: RDF's curious literals from Sandro Hawke on 2007-08-01 (semantic-web@w3.org from August 2007)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 01 Aug 2007 00:23:43 -0400
To: Garret Wilson <garret@globalmentor.com>
Cc: Sandro Hawke <sandro@w3.org>, Story Henry <henry.story@bblfish.net>, Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <5413.1185942223@ubuhebe>
Garret Wilson <garret@globalmentor.com> writes:
> Sandro Hawke wrote:
> > I'm not quite sure where you're going.  I think you're arguing about
> > ways the RDF model could be different, and changes which would make it
> > better.  That's okay, and sometime interesting.  At the same time, you
> > seem to be arguing about what URIs and strings *are*, in the abstract
> > (not related to RDF), and that's kind of confusing.  That's philosophy,
> > not engineering, I think.
> >   
> 
> I'm sorry to have confused you. I assure you that the philosophical 
> discussion was squarely in the "improving the RDF model" camp.
> 
> There seems to be a notion that things like the number 123 and the 
> boolean value true are some sort of different kind of resource, merely 
> because we have become accustomed to identifying these two particular 
> resources with strings rather than URIs. I find that distinction to be 
> completely arbitrary an unwarranted.

Absolutely.  It's like the distinction between electronic toys I can
afford and those I can't: it's quite important to me, today, but it says
nothing about the world, and it could change tomorrow.  It could have
changed already without me knowing it. 

Here's an example.  Let's define a new datatype, "eg:usmuni".  The value
space of this datatype is municipalities (cities and towns) in the US.
The lexical space is their names, in a form acceptable to the US Postal
Service.  So the RDF literal "San Francisco, CA"^^eg:usmuni denotes the
City of San Francisco.  By defining this datatype, I have made San
Francisco be an instance of class rdfs:Literal.

So, yes, the class rdfs:Literal is pretty darn silly -- unless maybe
it's with respect to some carefully chosen set of datatypes, as perhaps
it is in OWL....    

> My discussion of URIs and strings was to point out that if URIs were 
> invented earlier in the history of humans, we might all be accustomed to 
> identifying 123 as the sequence of characters 
> "http://example.org/numbers/123" instead of just "123". And I can 
> guarantee you that, had this been the case, RDF would not have evolved a 
> separate concept of "literals". But just because the sequence of 
> characters with which we identify numbers is different doesn't mean that 
> the concept of the value 123 is any different.
> 
> 123 is a resource, just like anything else. If you want to settle on a 
> common identifying URI, fine. But the concept of an RDF literals as a 
> special type of resource must go.
> 
> > Your example here reminds me of some discussions (in July 2000) about
> > how to define RDF literals.  At that point I was looking at using the
> > "data" URI schema.
> >
> > This URI:
> >      data:text/plain;charset=iso-8859-7,Hello%20World
> >
> > can perhaps be understood to denote the string "Hello World".
> 
> Fine. Let's agree on that URI for the string "Hello World" for the sake 
> of argument. But let's not make the resource it identifies an instance 
> of rdfs:Literal.
> 
> > The string literal we use to name those 11 characters is much shorter
> > than the URI.  But I agree we could use either one.  As it turns out,
> > the W3C Recommendation for RDF doesn't bother to use URIs to name
> > strings; it just defines a literal syntax instead.
> 
> Fine. As I've said numerous times, if RDF/XML, N3, RDFON, or whatever 
> other serialization format uses lexical forms to identify these 
> resources, that's great. But after reading the data model, when I walk 
> the resulting graph I don't want to have nodes be either literals or 
> non-literals. I just want to check the rdf:type and see if it's an 
> eg:USPresident or xsd:Integer. I can see which US president or integer 
> it is by looking at the URI.

The RDF Graph is just the abstract syntax, it's not the underlying
knowledge base.  At the graph level, "01"^^xsd:int and "1"^^xsd:int and
"1"^^xsd:integer are all different things, even though they all denote
the number one.

I don't find the RDF Graph a useful notion.  I think of N-Triples as the
abstract model of RDF.  It's obviously syntax; the graph is rather
mystifying.

> > Maybe that's less
> > elegant.
> 
> Creating a new type of beast for something just because you're used to 
> referring to it by a string rather than a URI makes a terribly inelegant 
> model. What if I want to identify animals by the sounds they make? 
> Should we introduce a special type rdfs:Audials, where the waveform 
> "oink oink" refers to pigs? Something like (oinkoink.mp3)^^eg:animal 
> (which says, in N3, "the resource for which makes an eg:animal noise of 
> (play "oink oink" here))? I don't mind properties that specify the sound 
> that a eg:Pig makes. I just don't want to use that as its identifier and 
> make that resource an instance of an rdfs:Audial in the RDF model.
>
> > Maybe it's more practical.  Maybe it's less practical.
> > **shrug**    It's what we've got, and it's probably not worth changing.
> >   
> 
> Ah, and that's what this comes down to. My particular approach to life 
> is this: just because someone else screwed it up in the past doesn't 
> mean we shouldn't fix it now. (The environment, racial segregation, 
> Darfur refugees, etc. come to mind.) I extend that attitude to the 
> semantic web. If the entire Internet is going to run on this thing, 
> shouldn't we fix it now?

By all means.  But, as with the environment, racial issues, refugee
problems, etc ... it's very hard work, and it's not always clear how to
solve the problem.

In the case of datatypes, there was a *long* fight that ended up with
the current design.  Those of us who remember the fight are not eager to
have it again (and I wasn't involved -- I just knew people who were).
We are also not particularly optimistic that a better solution will be
reached next time.

So when I say it's not worth changing, it's just a statement of my
priorities.  I think there are other areas of the Semantic Web that need
work more urgently.

> Garret
> 
> P.S. When I mention that "someone else screwed up [literals] in the 
> past," I do not mean to imply that creating something called "literals" 
> was not the obvious or reasonable choice at the time based upon the 
> then-current understanding of semantic modeling languages. Maybe it was; 
> maybe it wasn't. It remains that, in hindsight, RDF currently has a 
> screwed-up notion of literals that is an inconvenient anomaly. Based 
> upon our current knowledge, we need to fix it.

Understood.

   -- Sandro
Received on Wednesday, 1 August 2007 04:25:20 UTC