- From: Tim Berners-Lee <timbl@w3.org>
- Date: Fri, 27 Jul 2007 14:06:29 -0400
- To: Alan Ruttenberg <alanruttenberg@gmail.com>
- Cc: Chris Bizer <chris@bizer.de>, "SW-forum Web" <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>, "Jonathan A Rees" <jar@mumble.net>
On 2007-07 -27, at 01:18, Alan Ruttenberg wrote:
> [..]
> Right. I say truth in advertising. If your URI represents a
> "description of" then say that that's what the URI denotes, rather
> than saying it denotes the thing itself.
That is clear in the architecture.
I have tried to point out the value of the # in the architecture. It
makes it very clear
that
<http://www.w3.org/People/Berners-Lee/card>
is a document and
<http://www.w3.org/People/Berners-Lee/card#i>
is identifier for me. While it is true that <http://dbpedia.org/
resource/Tim_Berners-Lee> is also an identifier for me, there are
good social reasons why someone might want to use one or the other.
I might want to introduce myself to someone (or log in at a system)
making sure that the other party will have access to certain
information. I might make a link from my data to one or other on
the basis that I am more convinced that one or other will be well-
maintained.
I'm not going to persuade people to use one or the other. I do link
the one I control to the dbpedia one with owl:sameAs.
You say then,
> I guess I am arguing that it is always a bad idea to mint your own
> URI if you believe that some other URI names exactly the thing that
> you are about to name with yours. So if there is a URI that you are
> sure identifies a specific person, then use that instead of
> inventing a new one. On the other hand, if you want to mint a URI
> that is a resource *about* that person, according to you, then it's
> fine to mint one for that - no one else can claim to have exactly
> the same resource about that person.
I disagree. I think that in general, there should be a small number
of URIs. In general, yes, it is good to use a well-recognized one.
But there are cases when it makes sense to make an identifier.
I gave a talk a while ago at crossref.org, which maintains the doi:
set of Digital Object Identifiers for books. ("TIP: You can turn a
DOI string into a URL by appending the DOI string to http://
dx.doi.org/") They said they had a big problem: their databases
contain information connecting books and authors. They use dois for
the books, but what can they use for people? There is no central
registry for people. They have no right to invent identifiers for
people. They had run it not this problem because they were thinking
centralized, not weblike. They had the model that there should be
on central name for a book (and they should run it). This breaks
because other people have their IS for everything too -- no one can
practically socially be the one central truth, and that would be a
fragile system (socially and technically) if they were. But the
good news if that they still provide a very valuable function. They
provide a source of stable URIs for books (alas no RDF).
The other good news is that crossref CAN make URIs for people. They
can perform the incredibly valuable function of disambiguating,
within that community, the various people with similar names. They
can make RDF IDs for them. If they are very on the ball, they will
even allow author to store another RDF ID, like they FOAF ID, in the
crossref database, just like allowing an author to link to their own
homepage.
So, how does this relate to the Science commons? I think the life
sciences folks should not hold their breath until there is a unique
identifier for each protein, an a unique concept for what a "protein"
is exactly. They should serve up the actual records about these
things as documents, with known provenance and features and
failings. So a protein may get Ids in uniprot and in the Gene
Ontology, where the mapping isn't 100% crystal clear. And then
mapping files can be provided where the mappings exist. This allows
each data source to change if necessary, as new understandings
arise. The system must not be so rigidly connected that nothing can
grow. The service of the data should be maintained by the
organization which maintains the data, after an initial period when
people externally show them how it is done. (like biordf and bio2rdf).
Once these identifiers cease to be the one and only central ID, then
they can be minted pragmatically. I'm not going to jump up and
down about the # vs / but I think when the data is presented as a
set of records which I seem to remember it is, the # approach might
seem less weird to you.
Tim
Received on Friday, 27 July 2007 18:06:35 UTC