Re: RDF's curious literals from Sandro Hawke on 2007-08-01 (semantic-web@w3.org from August 2007)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 31 Jul 2007 21:07:11 -0400
To: Garret Wilson <garret@globalmentor.com>
Cc: Story Henry <henry.story@bblfish.net>, Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <19814.1185930431@ubuhebe>

I'm not quite sure where you're going.  I think you're arguing about
ways the RDF model could be different, and changes which would make it
better.  That's okay, and sometime interesting.  At the same time, you
seem to be arguing about what URIs and strings *are*, in the abstract
(not related to RDF), and that's kind of confusing.  That's philosophy,
not engineering, I think.

> So let me state this another way: to say that non-literal resources use 
> URIs as identifiers and literal resources use strings as identifiers is 
> a false dichotomy. RDF uses strings for all its identifiers. It's just 
> that for non-literals, these strings conform to a format called URI so 
> as to reduce clashes. There's no reason why we can't have the strings 
> identifying literals conform to this same format as well by 
> pre/postpending the appropriate information--- perhaps a "rdfliteral" 
> prefix and a ";datatype" postfix. Then we use URI-conforming strings for 
> everything.

Conceptually, outside of every URI or String is a single bit flag which
says whether this is to be treated as a string or as a URI.  In N3, it's
the delimeters.  URIs are written like this <...> and strings like this
"..."  This bit flag is important.    

In object oriented terms, you can think of it as two classes, URI and
String, each of which has one data field whose value is a sequence of
characters.  So they are very similar structurally, but the operations
defined for them are different, and you'll get lots of type errors if
you try to use one where the other belongs.  (Of course you can convert
between them, copying the character sequence from one to the other.)
With strings you concatenate them, take substrings, find substrings,
etc.  With URIs, you can look at the path components but you can also do
web operations like GET and POST.  Some APIs don't make this kind of
distinction, but in RDF it's important.    

Your example here reminds me of some discussions (in July 2000) about
how to define RDF literals.  At that point I was looking at using the
"data" URI schema.

This URI:
     data:text/plain;charset=iso-8859-7,Hello%20World

can perhaps be understood to denote the string "Hello World".  Try it in
your browser, btw.

[ Arguably, it could be said to denote an information resource which has
a text/plain representation which is the string "Hello World".  Is that
different?   **shrug**    Let's ignore this idea. ]

But see how the string 
    "Hello World"
corresponds to the URI 
     data:text/plain;charset=iso-8859-7,Hello%20World

The string literal we use to name those 11 characters is much shorter
than the URI.  But I agree we could use either one.  As it turns out,
the W3C Recommendation for RDF doesn't bother to use URIs to name
strings; it just defines a literal syntax instead.  Maybe that's less
elegant.  Maybe it's more practical.  Maybe it's less practical.
**shrug**    It's what we've got, and it's probably not worth changing.
Maybe there's a backward compatible change -- like defining a URI scheme
-- but I doubt it's worth the trouble.

For amusement value, note that the string
     "data:text/plain;charset=iso-8859-7,Hello%20World"
corresponds to the URI:
     data:text/plain;charset=iso-8859-7,data%3Atext/plain%3Bcharset%3Diso-8859-7%2CHello%2520World

etc.

    -- Sandro

Received on Wednesday, 1 August 2007 01:09:03 UTC