Re: URI lifecycle (Was: Owning URIs) from Dan Brickley on 2009-05-22 (semantic-web@w3.org from May 2009)

From: Dan Brickley <danbri@danbri.org>
Date: Fri, 22 May 2009 20:49:51 +0200
To: Pat Hayes <phayes@ihmc.us>
CC: David Booth <david@dbooth.org>, Hugh Glaser <hg@ecs.soton.ac.uk>, semantic-web <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
Message-ID: <4A16F3CF.1030708@danbri.org>
On 22/5/09 19:47, Pat Hayes wrote:

>> Yes, that's a great topic for discussion. It is clear that semantic
>> drift is a natural part of natural language: a word that meant one thing
>> years ago may mean something quite different now.
>
> And the same is happening with URIs. My favorite example is dc:author,
> which when coined was intended to refer to the relation of authorship
> between people and things like books, things that would be found in a
> library catalog.

May I spoil your example? dc:author doesn't exist. Never has. Well, very 
early in Dublin Core history we had "dc:author". Since 1996 or so it has 
been "creator". This was because of the early workshop 
http://dublincore.org/workshops/dc3/ Workshop on Metadata for Networked 
Images (Sept 1996), where it was realised the DC was useful for images 
with only modest changes.

	"The CNI/OCLC Image Metadata Workshop focused on the use of the Dublin 
Core (DC)to describe images. Consensus formed around the assertion that, 
with some modifications of element names and definition, the Dublin Core 
would serve quite adequately for description of a large class of image 
resources, particularly those that share characteristics with the 
document-like objects that were the original focus of DC."
http://www.dlib.org/dlib/june97/metadata/06weibel.html

Cultural heritage, museums etc., applications of DC showed up around the 
same time. It has been clear for a very long time that dc:creator 
applies to anything that can be created. DC isn't very discriminating.

> 	 But by now, thanks to FOAF, the overwhelmingly largest
> usage of dc:author is to state the relationship between a person and
> their FOAF home page.

Even reading "dc:creator" there, I'm doubtful. Well, it depends on your 
measure. Perhaps every user on livejournal.com has this markup. Which 
makes for millions of documents and triples, but ... this could also be 
changed with a single line of Perl code being updated. A couple of 
online library catalogues could probably balance all this, or chuck in a 
museum or two. It's hard to know what kinds of thing to count in these 
comparisions: triples, documents, consuming apps, producing apps, 
projects, etc.

But anyway, having spoiled your example, may I offer a new one in it's 
place?

foaf:schoolHomepage. This is a property originally created by brits for 
whom School is where you go until you're at most 18. After which it's 
off to University, College, Tescos, or whatever YTS schemes are called 
these days. *However* ... shortly after deploying foaf:schoolHomepage, 
it became clear that it meant something quite different to USAmericans 
and presumably others. We started seeing instance data where people were 
asserting foaf:schoolHomepage between themselves and the homepage of 
their University. This was unexpected, but not really suprising.

Being a pragmatist, I updated 
http://xmlns.com/foaf/spec/#term_schoolHomepage  ...

It now mentions this drift explicitly: "The original application area 
for foaf:schoolHomepage was for 'schools' in the British-English sense; 
however American-English usage has dominated, and it is now perfectly 
reasonable to describe Universities, Colleges and post-graduate study 
using foaf:schoolHomepage."

> 		This is a real social meaning shift, and it
> happened without anyone really noticing and without anything breaking or
> failing to work.

For the DC case, (a) I think the FOAF usage is within the broad and 
naturally scruffy meaning of dc:creator. Some specific issues and 
problems were very much noticed, mostly to do with the confusing range 
of dc:creator (string or thing or Seq, etc), but this wasn't FOAF specific.

For the schoolHomepage case, yes the shift was natural and normal, 
although it was noticed and the documentation eventually caught up with 
the world. Just as it works with dictionaries. Relatedly, the abstract 
for the FOAF spec calls out the dictionary analogy:

"This specification describes the FOAF language, defined as a dictionary 
of named properties and classes using W3C's RDF technology."

> If the original DC specs had posted a detailed
> 'authoritative' ontology, the change would still have happened and it
> would still have worked, but there would have been interminable debates
> about whether a home page was really a "work" (or whatever the term that
> was used), suggestions that FOAF use a different URI, etc., etc.,, all
> to absolutely no purpose.

Yeah, same with schoolHomepage. We could have had a School, University 
or Educational Institution  class in there from the start, but defining 
exactly what counts as a University is somewhat fiddly. Do Polytechnics 
count as Universities? What about the schools and organizations run by 
Scientology, etc? (They don't call it the pedantic Web for nothing...)

> Just look at the interminable and utterly
> pointless debate now raging about exactly what an 'information resource'
> *really is*, none of which has any bearing whatsoever on how the actual
> Web works, even though the latter is actually constructed almost
> entirely out of the former.
>
>> As humans we can
>> usually deal with this semantic drift by knowing the context in which a
>> word is used, though it can cause real life misunderstandings sometimes.
>>
>> However, I think our use of URIs in RDF is different from our use of
>> words in natural language, in two important ways:
>>
>> - RDF is designed for machine processing -- not just human
>> communication -- and machines are not so good at understanding context
>> and resolving ambiguity; and
>>
>> - with URI declarations there is a simple, feasible, low-cost mechanism
>> available that can be used to anchor the semantics of a URI.
>
> But that begs the question of whether you want them to be anchored. I
> suggest that we often don't: that letting them 'drift' in meaning to fit
> their usage is exactly what we want to be happening.
>
>>
>> In short, although semantic web architecture could be designed to permit
>> unrestricted semantic drift, I think it is a better design -- better
>> serving the semantic web community as a whole -- to adopt an
>> architecture that permits the semantics of each URI to be anchored, by
>> use of a URI declaration.
>
> And I disagree.

Seconded. But perhaps for different reasons. We need to leave some 
flexibility in the system so that the most useful uses of classes and 
properties can emerge from experimentation and deployment.

> I think this whole idea is based on the insistence of
> various authoritative sources upon the naive idea that URIs have to
> "identify" things. This has never been the case, in fact, even in the
> pre-Semantic web, and its even less the case now. Its a chimera: forget
> about it, rather than try to enforce it. What URIs do is fetch chunks of
> information. Hardly anyone using the normal Web in the normal way gives
> a damn what "thing" their URIs "identify": they only care about what
> they are looking at, which is whatever that "thing" sent back to them in
> the body of the 200 response, and what that means or what it can do. The
> very design of html is all about *hiding* the URIs from users, not about
> telling them what it is that URIs identify.

The "URIs are identifiers" story is a convenient enough fiction but one 
for engineers not end users. Trying to nail down what exactly it means 
for some symbol to name some thing (or identify that thing) is equally 
doomed online and off. I'm not going to hold my breath waiting for a 
realist theory of reference to succeed in the cognitive sciences (by 
which I mean an account for how words or mental gubbins truly come to 
"refer"). And if I'm not going to wait there, I'm not going to wait 
w.r.t. Web specs either. People get by assuming that there is some fact 
of the matter about how it all works, even if there isn't. But all that 
said, the rough idea that URIs are names for things is useful enough and 
is all we need. We just shouldn't poke into the details too much 'cos 
the whole story will unravel...

cheers,

Dan
Received on Friday, 22 May 2009 18:50:38 UTC