- From: Tim Berners-Lee <timbl@w3.org>
- Date: Fri, 27 Jul 2007 14:06:29 -0400
- To: Alan Ruttenberg <alanruttenberg@gmail.com>
- Cc: Chris Bizer <chris@bizer.de>, "SW-forum Web" <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>, "Jonathan A Rees" <jar@mumble.net>
On 2007-07 -27, at 01:18, Alan Ruttenberg wrote: > [..] > Right. I say truth in advertising. If your URI represents a > "description of" then say that that's what the URI denotes, rather > than saying it denotes the thing itself. That is clear in the architecture. I have tried to point out the value of the # in the architecture. It makes it very clear that <http://www.w3.org/People/Berners-Lee/card> is a document and <http://www.w3.org/People/Berners-Lee/card#i> is identifier for me. While it is true that <http://dbpedia.org/ resource/Tim_Berners-Lee> is also an identifier for me, there are good social reasons why someone might want to use one or the other. I might want to introduce myself to someone (or log in at a system) making sure that the other party will have access to certain information. I might make a link from my data to one or other on the basis that I am more convinced that one or other will be well- maintained. I'm not going to persuade people to use one or the other. I do link the one I control to the dbpedia one with owl:sameAs. You say then, > I guess I am arguing that it is always a bad idea to mint your own > URI if you believe that some other URI names exactly the thing that > you are about to name with yours. So if there is a URI that you are > sure identifies a specific person, then use that instead of > inventing a new one. On the other hand, if you want to mint a URI > that is a resource *about* that person, according to you, then it's > fine to mint one for that - no one else can claim to have exactly > the same resource about that person. I disagree. I think that in general, there should be a small number of URIs. In general, yes, it is good to use a well-recognized one. But there are cases when it makes sense to make an identifier. I gave a talk a while ago at crossref.org, which maintains the doi: set of Digital Object Identifiers for books. ("TIP: You can turn a DOI string into a URL by appending the DOI string to http:// dx.doi.org/") They said they had a big problem: their databases contain information connecting books and authors. They use dois for the books, but what can they use for people? There is no central registry for people. They have no right to invent identifiers for people. They had run it not this problem because they were thinking centralized, not weblike. They had the model that there should be on central name for a book (and they should run it). This breaks because other people have their IS for everything too -- no one can practically socially be the one central truth, and that would be a fragile system (socially and technically) if they were. But the good news if that they still provide a very valuable function. They provide a source of stable URIs for books (alas no RDF). The other good news is that crossref CAN make URIs for people. They can perform the incredibly valuable function of disambiguating, within that community, the various people with similar names. They can make RDF IDs for them. If they are very on the ball, they will even allow author to store another RDF ID, like they FOAF ID, in the crossref database, just like allowing an author to link to their own homepage. So, how does this relate to the Science commons? I think the life sciences folks should not hold their breath until there is a unique identifier for each protein, an a unique concept for what a "protein" is exactly. They should serve up the actual records about these things as documents, with known provenance and features and failings. So a protein may get Ids in uniprot and in the Gene Ontology, where the mapping isn't 100% crystal clear. And then mapping files can be provided where the mappings exist. This allows each data source to change if necessary, as new understandings arise. The system must not be so rigidly connected that nothing can grow. The service of the data should be maintained by the organization which maintains the data, after an initial period when people externally show them how it is done. (like biordf and bio2rdf). Once these identifiers cease to be the one and only central ID, then they can be minted pragmatically. I'm not going to jump up and down about the # vs / but I think when the data is presented as a set of records which I seem to remember it is, the # approach might seem less weird to you. Tim
Received on Friday, 27 July 2007 18:06:35 UTC