Re: Some TAG review of "Cool URIs for the Semantic Web"

Hi folks

short version: I didn't (and don't) consider classes and properties 
"non-information" resources. I don't believe there's a useful and 
consensual definition of "information resource" that classes and 
properties won't squeeze into. "/" vs "#" was a choice amongst two bad 
options; each had scary issues. Other problems were more important so I 
   decided and moved on.

Pat Hayes wrote:
> 
>> On Thu, 2007-09-27 at 15:13 -0500, Pat Hayes wrote:
>> [...]
>>>  >Can a city be an HTTP endpoint?
>>>  >How about a physical book?
>>>  >a robot?
>>>  >an integer?
>>>  >a set of integers?
>>>  >an RDF Class?
>>>  >an RDF property?
>>>  >an XML namespace?
>>>  >a language (such as XML Schema)?
>>>  >
>>>  >Those are the practical questions that I see the community
>>>  >working on.
>>>
>>>  Surely most of these answers are blindingly obvious.
>>
>> The long history of the httpRange-14 issue suggests the
>> answers are anything _but_ obvious.

For me, the interesting question is: are these important distinctions to 
be trying to make? What breaks if we sweep this debate under the rug?

>>>   Integer, set (of
>>>  anything), class, property, namespace, language: all obviously,
>>>  necessarily, not, as these aren't physical entities.
>>
>> That wasn't obvious to the dublin core nor FOAF designers;
>> they chose hash-less http URIs for dc:title and foaf:name and such,
>> and they used to give 200 responses to GET requests there for years,
>> despite TimBL's protests.
> 
> Well, but wait. Did they do this because they thought these WERE 
> information resources or HTTP endpoints, or because (contra or pre- 
> http-range-14) they thought it was fine to give a 200 code back from a 
> URI denoting a non-information resource? I suspect the latter. 
> I felt this way myself until quite recently (cf 
> http://www.ihmc.us/users/phayes/PatHayes.html) and I still get the 
> occasional referential quiver.

Glad you asked. Here's what happened.

In early 2000 I had to pick a URI to put in those 
xmlns:foaf="http://blahblah" places. And it had to end in some character 
or other, obviously. This was pre-TAG. It was in the era of drift when 
W3C had blessed the evocative but vague RDF'99 Model and Syntax spec as 
a standard, then left things resting for a while as the dot-com world 
went crazy for XML. RDFCore didn't exist. Many things were uncertain. 
And RDF was more or less un-used outside of Netscape/Mozilla, Dublin 
Core and the RSS 1.0 drafts.

At the time I was bored and worried by long rambling threads on the RDF 
lists about reification, anonymous nodes, debates about whether people 
were resources, how many angels would fit on the head of an http server, 
  and what the "real" URI of an anonymous node should be, what 
"reification" really meant, etc.

(Some things change, some stay the same :)

So I thought, okay, let's see how this stuff bears up outside the lab. 
Can we put descriptions of real existing people in the public Web, link 
them together, and crawl them back into a usable database. Can we figure 
out which people they're describing, despite the lack of globally 
adopted well-known identifier mechanisms for humans  (no urn:person:blah 
blah). Can we use PGP etc to get some more assurance of which people 
were behind which RDF statements? Can we make a linked Web of 
machine-readable pages just like we have one for humans...

And it turned out we could pretty much do that. And by doing so, certain 
weaknesses in the RDF toolset and specs became clear: RDF databases at 
the time tended to simply store triples, and threw away information 
about where those triples had come from. People like Edd Dumbill who 
were crawling RDF/FOAF data at the time had to hack provenance/source 
mechanisms themselves - eg see his writeup at 
http://www.ibm.com/developerworks/xml/library/x-rdfprov.html  ... 
changes which got reflected into Redland core eventually, and provide 
use cases for things like SPARQL's 'GRAPH' construct. These strike me as 
useful areas for to explore.

In that context, choosing the final character of the URIs that named our 
classes and properties was the least of our worries. It was my choice 
and I chose what at the time seemed the lessers of two evils. Or rather, 
of two uncertainties. The leading contenders for last-character-in-uri 
were "/" and "#". My reading of the relevant URI specs at the time made 
me worried about using "#" because its meaning/interpretation was 
relative to a media type, and the Web architecture encourages content 
negotiation of document formats. I wanted to make both human and machine 
documentation available at the namespace URI, so that seemed a rather 
unfortunate interaction (particularly because RDFa didn't exist yet).

So I went for "/"-based names for classes and properties, ie. as names 
for things like foaf:Person, foaf:mbox_sha1sum.

To me, such classes and properties are not abstract mathematical sets 
(although they are closely related to the maths). Someone else might 
define another Person class with exact same instances. But that's 
someone else's work, ... ie a separate thing. It might even have a 
different rdfs:label and rdfs:comment. To me, classes and properties in 
RDF are more or less "works", analagous to a book, poem or song. Some 
people join orchestras or theatres; others write computer code and 
ontologies. And the Web architecture - it seemed to me then as now - 
allows works (such as my homepage, another of my works) to be named or 
identified with URIs that begin "http://" and end in  "/". Before 2000, 
the thing called http://xmlns.com/foaf/Person hadn't been created. Just 
the same with homepages etc. So at the time it didn't seem particularly 
contentious to treat classes and properties as just more kinds of 
resource that might have Web accessible representations (like Hamlet and 
the Bible). So I used "/" and life went on.

I find it incredibly embarrassing that we are still discussing, all 
these years later, whether (in effect) there is an important distinction 
between the Bible and dc:description such that one work can have URIs 
beginning "http://" and ending "/" while the other can't. There are so 
many different ways of carving up the world into categories, each with 
merits and flaws. Why on earth should we hard code one such arbitrary 
distinction right into the core of the Web architecture?

So I welcomed the resolution of HTTP-range-13 as a way of putting this 
behind us, and adding a few lines of sysadmin voodoo into an Apache HTTP 
config file was a small price to pay. But I really don't think it should 
be the business of W3C to try to divide the world into two crisply 
defined categories and police laws for the spelling of their respective 
URIs. There are a great many important practical problems out there to 
address, problems which no organisation but W3C could solve. And this 
ain't one of them...

cheers,

Dan

--
http://danbri.org/


ps. re DOLCE, we weren't oblivious to such richer models, but the 
underlying languages of the Semantic Web from W3C simply didn't provide 
the primitives I wanted for treating time and change properly. Hence the 
brief prose attempt in http://xmlns.com/foaf/spec/#term_mbox to define 
"static inverse functional property" as one in which "there is (across 
time and change) at most one individual that ever has any particular 
value for foaf:mbox.". The weakness of DAML and later OWL in this regard 
means that the semantics of OWL alone don't really justify all the 
inferences we need when merging information using 
reference-by-description techniques. Trying to model time and change 
very formally on top of vanilla RDFS/OWL isn't something I'm yet 
convinced is useful or practical. Would be happy to be proved wrong :)

Received on Friday, 28 September 2007 23:04:25 UTC