Re: RDF's curious literals from Garret Wilson on 2007-08-01 (semantic-web@w3.org from August 2007)

From: Garret Wilson <garret@globalmentor.com>
Date: Wed, 01 Aug 2007 12:32:54 -0700
To: Story Henry <henry.story@bblfish.net>
CC: Semantic Web <semantic-web@w3.org>
Message-ID: <46B0DFE6.9070301@globalmentor.com>
Henry,

I promise I'm going to try to take a few hours' break from this 
discussion and get some work done. Too bad we can't sit down, have a 
beer, and throw french fries at each other while we talk about this. :) 
Just couple more comments until later this evening or tomorrow:

Story Henry wrote:
>>
>> You still didn't give me an example of what could *not* be a literal, 
>> even though you stated that "there are in fact limitations on what 
>> can be a Literal."
>
> George Bush can not be a literal. I think that is clear. Even if he 
> thinks literally, that is without looking at the world.

I will not comment on George Bush's thinking ;) , but I know that I can 
A ) create a controlled set of lexical representations to represent US 
presidents, B ) identify that set of lexical representations using the 
datatype eg:uspresidents, and C ) identify George Bush using "George W. 
Bush"^^eg:uspresidents. I have identified George Bush using RDF, and in 
the RDF model he has become an instance of rdfs:Literal, no more and no 
less than the number 123 is an instance of rdfs:Literal in my model when 
I use "123"^^xsd:integer.


>
> In which model does it have to be presented that way? You mean in the 
> spec right? But that is just a way to make sure we all agree on 
> something. A bit like a Java reference implementation for say the 
> Servlet API. Everyone can create more efficient ones later.

I think that this discussion is having problems because you don't 
recognize that there is something in between your database and your N3 
notation called the "RDF model". It specifies the canonical way in which 
resources and their properties are understood with respect to a 
particular framework (RDF), independent of how it is stored in the 
database, independent of how you specify it in a text file using N3, and 
independent of the syntax you use to query it. The RDF data model is 
analogous to the W3C Document Object Model (DOM) for XML.

Most of your discussion has been about how I specify numbers in N3, or 
how I store them in the database. I'm talking about how resources are 
represented in the RDF model. Yes, the RDF model is *described* by the 
RDF specification, but the specification is not the model. See 
http://www.w3.org/TR/rdf-primer/#rdfmodel for an introduction. Then see 
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-data-model 
. These specifications define how RDF views the things that exist in the 
world, independent of the serialization syntax or database 
representation. (The model is different from the Java servlet API 
because the Java servlet specification only specifies the interface 
doesn't specify what a servlet engine does inside. The RDF 
specification, on the other hand, is very interested in what your N3 
representation *means*---what model is formed of the world by your 
serialization.)

The conflation of serialization with the RDF model, and/or the 
conflation of the database representation with the RDF model, would 
certainly lead you to say that I can just "fix the problem" by getting 
enough people to use URIs for integers. That won't solve the problem. 
The RDF data model sees things identified by strings as different types 
of things than those identified by URIs. This is a fundamental problem 
with the RDF data model, and the only solution is to change the 
model---the way that RDF characterizes the world.


>
> Did you go an read http://www.w3.org/TR/swbp-xsch-datatypes/
> or any of the other pointers people have been giving you?
>
> I know you have not because you are responding within minutes of the 
> reply reaching you.

Now, let's not make personal attacks or rash assumptions. You gave me 
the link above. I went there. I skipped the section "Status of this 
Document". Then I skipped "Table of Contents". I skipped "Introduction" 
and "Reading this Document" (although if I would have fully read 
"Reading this Document" I would have seen that it says, "many readers 
will benefit from skipping sections."). Then I skipped "Namespaces Used 
in this Document."

I went straight to the section which is relevant to our discussion: "3. 
Comparison of Values", which states the problem you raised: "What is the 
relationship between the value spaces of the various XML Schema built-in 
simple types when used within RDF and OWL? Or in other words, when do 
two literals, which are written down differently, refer to the same 
value. For example, "10"^^xsd:integer and "010"^^xsd:integer both denote 
the integer ten." And that's the relevant portion. You asserted that, if 
two xsd:integers have different lexical forms, they represent distinct 
resources, unlike URIs, which may identify the same resource. I read 
through the example, which gave concrete examples exactly contrary to 
your assertion.

So no, I didn't read the section about the namespaces being used, or the 
20% of the document that the "References" take up. But I did read the 
relevant sections of the document, and I cited examples from that 
document that dispute what you claimed to be a benefit of rdfs:Literal 
over normal resources identified by URIs.


>> But I point out that you have exactly the same situation with 
>> literals---just because the lexical form is different doesn't mean 
>> that they refer to different resources. Let me quote from the same 
>> document http://www.w3.org/TR/swbp-xsch-datatypes/ you point to above:
>>
>> "For example, "10"^^xsd:integer and "010"^^xsd:integer  both denote 
>> the integer ten."
>>
>> But it gets even worse for your case. Quoting again:
>>
>> "|"15"^^xsd:byte| and |"15.0"^^xsd:decimal| both denote the same 
>> value, fifteen.This follows because xsd:byte has primitive base 
>> datatype xsd:decimal."
>
> this is not a problem.:
>   - How each of them map to each other is well known.
>   - How you store it is up to you, so the fact that you can name them 
> differently is not a problem
>   - you know all you need to know about them when you have their name.

Yes, I know all this---but you were saying that somehow rdfs:Literal 
brings some benefits over normal resources with URIs. You have not 
demonstrated that, and the one example you gave, about how you don't 
need to have OWL statements to say that two literals are distinct, could 
be said about URIs that are specified to be integers as well.


> The question can really be turned around: what are you gaining by just 
> having URIs.

By having URIs I have a consistent data model without some strange type 
of resources that need extra types of querying or decisions in my 
program. I can treat everything the same.

>
> Can you let us know how this is causing you any trouble?

Let's say I use my RDF library in JAVA to read in some N3:

RDFProcessor rdfProcessor=new RDFProcessor();
RDFDataModel rdf=rdfProcessor.process(new FileInputStream("my.n3"));

Then I find all the books in the data model:

Resource[] books=rdf.getResourcesByType("eg:Book");

Which is maybe a convenience method for this:

Resource[] books=rdf.getResourcesByPropertyValue("eg:Book", "rdf:type");

So if I call books[0].getProperty("rdf:type"), it gives me "eg:Book". No 
problem.

So what if I call 
books[0].getProperty("dc:title").getProperty("rdf:type")? It doesn't 
give me anything, because the object of the first book's rdf:title 
property is not a resource---well, RDF says (hands waving) that it's a 
resource, but it treats it differently from other resources, making it 
some strange rdfs:Literal thingy. I want to call 
books[0].getProperty("dc:title").getProperty("rdf:type") and get 
"xsd:Integer", just like other resources.

So let's take that further. Let's get the number of pages of the book.

Resource bookPageCount=books[0].getProperty("eg:pageCount");

So now I have the book's page count. The page count is a resource---so 
RDF would tell you. But really, I have to do something like this:

if(bookPageCount instanceof Literal)
{
  //do something if the resource is a literal
}
else
{
  //do something if the resource is a non-literal resource
}

This is just crazy. And unnecessary.

And it just gets worse from here. Let's say that I want to get all the 
books in the data model with an author of 
<http://example.com/us/presidents/GeorgeWBush>. That's easy:

Resource[] books=rdf.getResourcesByPropertyValue("dc:author", 
URI.create("http://example.com/us/presidents/GeorgeWBush"));

But how do I get all the books with exactly 100 pages? Well, if RDF were 
consistent, I could do the same thing:

Resource[] books=rdf.getResourcesByPropertyValue("eg:pageCount", 
URI.create("http://example.com/integers/100"));

But I can't do that. Why? Because "100" is a literal. So? Isn't it a 
resource as well? Well, yes---sort of.

So let me make it easier. Let me just get all the books authored by 
resource of type foaf:Person:

Resource[] books=rdf.getResourcesBySubPropertyValue("dc:author",  
"rdf:type", "foaf:Person");

No problem. Whee! This is easy. So now let's find all the books with 
page counts of type xsd:Integer:

Resource[] books=rdf.getResourcesBySubPropertyValue("eg:pageCount",  
"rdf:type", "xsd:Integer");

What? I can't do that? But isn't 100 a resource, too? Shouldn't it have 
an rdf:type, just like any other resource? Why do I have to treat it 
differently?

The treatment of literals in RDF is a pain, it is strange, it is 
inconsistent, and it is completely unnecessary. This is a problem with 
the RDF model---the way that RDF characterizes the world---and only 
changing the RDF model will fix the problem.

Garret
Received on Wednesday, 1 August 2007 19:33:06 UTC