Re: my laziness with literals from Dan Brickley on 2007-10-12 (semantic-web@w3.org from October 2007)

From: Dan Brickley <danbri@danbri.org>
Date: Fri, 12 Oct 2007 15:41:49 +0200
To: Garret Wilson <garret@globalmentor.com>
CC: Semantic Web <semantic-web@w3.org>
Message-ID: <470F799D.4030901@danbri.org>
Garret Wilson wrote:
> 
> Just a comment and a bit of general advice:
> 
> I've noted that, when storing data, I'm either very lazy or averse to 
> verboseness (the latter not on this list, of course ;) ). I seem to want 
> to stick everything into a plain literal. I was converting some of my 
> old data to a new format today and it wasn't working. Then I realized 
> that my integers were stored as plain literals when I could have used 
> xsd:integer. My booleans were stored as plain literals when I could have 
> used xsd:boolean. My code was balking at a bunch of strings when my API 
> wanted numbers and booleans.
> 
> And I'm not the only one. The way RDF has evolved from plain literals to 
> typed literals, along with the verbose RDF/XML syntax for typed 
> literals, has helped bring out the laziness in all of us. Want a 
> language? Stick it in the plain literal "en-US". Want a URI? Stick it in 
> a plain literal. Want a date? Stick it in a plain literal. Want an 
> Internet media type? Stick it in a plain literal.
> 
> But if we're going to produce semantic rich data that can be 
> machine-processed, we need to store things as they are, with appropriate 
> indication of type.
> 
> So my plea to all data-architects:

I'm not convinced of this. RDF/XML's syntax for datatyping is pretty 
heavyweight, and there are many RDF vocabularies that pre-date RDFCore 
(ie. created between 1997-2003).

It would be good to have a notation in RDFS/OWL (maybe OWL1.1 could do 
it) to indicate that some plain-literal-valued property takes string 
values that can be cast to some specified datatype.

> 
> * If you're going to store a number, use a typed literal with 
> xsd:integer or similar.
> * If you're going to store a boolean, use a typed literal with 
> xsd:boolean or similar.
> * If you're going to store a URI, use a typed literal with xsd:anyURI.

RDF has special handling for URIs. Almost always people are interested 
in the thing the URI is identifying, not in the URI string itself.

> * If you're going to store a language, use something like info:lang/en/US.
> * If you're going to store a Java class, use something like 
> info:lang/com/example/package#Class.

There is a java: URI scheme. This is used for example in ARQ for dynamic 
  code loading. I don't see a case for using info: instead.

> * If you're going to store an Internet media type, use something like 
> info:media/text/plain.

Or dc:format? It's good to agree on ways of doing these things, but your 
choices seem a little arbitrary, and not yet widely used.

> I know it's easier just to stick these things in plain literals, but 
> when someone else tries to machine-process your data, it has to take 
> what's there. I'm going to suppress my laziness and stop producing 
> specifications and data the rely on plain literals as a crutch. I 
> encourage everyone to do the same.

Can we take "Be liberal in what you accept, and conservative in what you 
send." (see http://www.postel.org/postel.html ) as a shared goal here?

Of course defining "conservative" here is the slippery part :)

cheers,

Dan


> Best,
> 
> Garret
>
Received on Friday, 12 October 2007 13:42:27 UTC