Re: URI aliases and owl:sameAs was: Terminology Question concerning Web Architecture and Linked Data from Tim Berners-Lee on 2007-07-27 (semantic-web@w3.org from July 2007)

From: Tim Berners-Lee <timbl@w3.org>
Date: Fri, 27 Jul 2007 14:06:29 -0400
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: Chris Bizer <chris@bizer.de>, "SW-forum Web" <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>, "Jonathan A Rees" <jar@mumble.net>
Message-Id: <9DD8BAF0-7E5B-4290-86CB-94923DB9EB31@w3.org>

On 2007-07 -27, at 01:18, Alan Ruttenberg wrote:
> [..]

> Right. I say truth in advertising. If your URI represents a  
> "description of" then say that that's what the URI denotes, rather  
> than saying it denotes the thing itself.

That is clear in the architecture.

I have tried to point out the value of the # in the architecture. It  
makes it very clear
that

	<http://www.w3.org/People/Berners-Lee/card>

is a document and

	<http://www.w3.org/People/Berners-Lee/card#i>

is identifier for me.  While it is true that <http://dbpedia.org/ 
resource/Tim_Berners-Lee> is also an identifier for me, there are  
good social reasons why someone might want to use one or the other.   
I might want to introduce myself to someone (or log in at a system)  
making sure that the other party will have access to certain  
information.    I might make a link from my data to one or other on  
the basis that I am more convinced that one or other will be well- 
maintained.

I'm not going to persuade people to use one or the other.  I do link  
the one I control to the dbpedia one with owl:sameAs.

You say then,

> I guess I am arguing that it is always a bad idea to mint your own  
> URI if you believe that some other URI names exactly the thing that  
> you are about to name with yours. So if there is a URI that you are  
> sure identifies a specific person, then use that instead of  
> inventing a new one. On the other hand, if you want to mint a URI  
> that is a resource *about* that person, according to you, then it's  
> fine to mint one for that - no one else can claim to have exactly  
> the same resource about that person.

I disagree.   I think that in general, there should be a small number  
of URIs. In general, yes, it is good to use a well-recognized one.   
But there are cases when it makes sense to make an identifier.

I gave a talk a while ago at crossref.org, which maintains the doi:  
set of Digital Object Identifiers for books.  ("TIP: You can turn a  
DOI string into a URL by appending the DOI string to http:// 
dx.doi.org/")   They said they had a big problem:  their databases  
contain information connecting books and authors. They use dois for  
the books, but what can they use for people?  There is no central  
registry for people. They have no right to invent identifiers for  
people.  They had run it not this problem because they were thinking  
centralized, not weblike.    They had the model that there should be  
on central name for a book (and they should run it).  This breaks  
because other people have their IS for everything too -- no one can  
practically socially be the one central truth, and that would be a  
fragile system (socially and technically)  if they were.  But the  
good news if that they still provide a very valuable function. They  
provide a source of stable URIs for books (alas no RDF).

The other good news is that crossref CAN make URIs for people.  They  
can perform the incredibly valuable function of disambiguating,  
within that community, the various people with similar names. They  
can make RDF IDs for them.  If they are very on the ball, they will  
even allow author to store another RDF ID, like they FOAF ID, in the  
crossref database, just like allowing an author to link to their own  
homepage.

So, how does this relate to the Science commons?  I think the life  
sciences folks should not hold their breath until there is a unique  
identifier for each protein, an a unique concept for what a "protein"  
is exactly. They should serve up the actual records about these  
things as documents, with known provenance and features and  
failings.   So a protein may get Ids in  uniprot and in the Gene  
Ontology, where the mapping isn't 100% crystal clear.  And then   
mapping files can be provided where the mappings exist. This allows  
each data source to change if necessary, as new  understandings  
arise.  The system must not be so rigidly connected that nothing can  
grow.  The service of the data should be maintained by the  
organization which maintains the data, after an initial period when  
people externally show them how it is done. (like biordf and bio2rdf).

Once these identifiers cease to be the one and only central ID, then  
they can be minted pragmatically.     I'm not going to jump up and  
down about the # vs /  but I think when the data is presented as a  
set of records which I seem to remember it is, the # approach might  
seem less weird to you.

Tim

Received on Friday, 27 July 2007 18:06:35 UTC