Re: identifier to use from Eric Jain on 2007-08-22 (public-semweb-lifesci@w3.org from August 2007)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Wed, 22 Aug 2007 15:57:55 +0200
To: Hilmar Lapp <hlapp@duke.edu>
CC: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Message-ID: <46CC40E3.6030802@isb-sib.ch>
Hilmar Lapp wrote:
> Right. That was one of the problems that was faced when the I3C 
> consortium started (namely multiple identifier systems with 
> idiosyncratic translation rules to convert to a resolvable URL), and 
> which it tries to address by unifying the identifier and resolution 
> schemes.

Great, but from an outside point of view, didn't you just end up adding yet 
another idiosyncratic system?


> My point was that domain-specific identifier and resolution schemes are 
> a matter of fact, and some evidence shows that the fact that they are 
> domain specific doesn't diminish their ability to succeed and become 
> de-facto standards.

I guess that could happen... Do you have some examples of domain-specific 
standards that became de-facto standards, supported by generic tools etc?


> As for being limited to a domain or not, would the LSID mechanism be 
> more appealing if it read urn:guid:foo.org:Foo:12345? There's nothing in 
> the LSID spec that makes it LS-specific, or due to which it make no 
> sense outside of the LS.

You're right, from a technical point of view, it's not domain-specific. But 
if no one else is using it, doesn't that make it de-facto domain-specific?


> Do you mean you would prefer if each journal set up URIs based on its 
> self-chosen domain-name and we reference articles through that instead 
> of DOIs? Or did you want to say something else?

If instead of doi:10.1038/nrg2158 an official URI looked something like
http://dx.doi.org/10.1038/nrg2158, would this make the system less popular?

In fact, I suspect that the lack of such a transformation mechanism turned 
away many people from the LSID system (that, and the ugly syntax :-)

I'd also be fine with using e.g. http://www.nature.com/nrg/journal/nrg2158; 
if Nature went out of business, the DOI isn't more useful, or is it?

Note: While most publishers seem to have adopted the DOI system, I don't 
see many people using it (e.g. in queries) on our site. But if someone who 
works for a publisher is lurking, they might have better usage stats!


> I'm not sure you are trying to advocate future standards based on the 
> abilities or lack thereof of the current generation of semantic web tools?

Are we talking about future standards, or current best practices?

As things are, if I am asked for advice, I can't tell anyone that they 
should use approach x instead of y, because even though y is simpler and 
more widely supported, tool providers need to be encouraged to support x.


> Just as they will have to support DOIs to be practical, I don't see why 
> they would shy away from supporting LSIDs, if they are widely used.
> 
> To make them widely used is upon the data providers, though, not the 
> tool makers.

Chicken-and-egg alert! :-)


> Well, yeah, but the big challenge is still a big challenge and a real 
> one, and advocating stable HTTP URIs as a solution surely will not 
> contribute to solving the big challenge?

Forces that work against stable, resolvable HTTP URIs:

1. People reorganize their web servers, change technologies etc.
2. Data is removed or replaced.
3. Data providers disappear.

The first issue is something that might be improved with W3C guidelines -- 
and third-party PURLs for those who refuse to listen :-)

The second and third issues are trickier -- and I'm not sure how non-HTTP 
URIs help here? The problem is that even if you want to version your data 
and allow retrieval of obsolete data, the infrastructure for this isn't 
trivial. For example, we've invested some effort to support this for some 
of our data [e.g. try http://beta.uniprot.org/uniprot/P05067?version=42], 
but that's just part of our data, and we don't support all formats, either.

The best solution to disappearing data I can see is that you have some 
Google-scale, Internet Archive-like projects that go and collect all data.


> Right. Does this advocate for or against an opaque identifier system? 
> BTW there are standards to deal with that, such as OpenURL (however 
> imperfect that may be).

I don't see any strong reason to advocate either approach. Opaque 
identifiers such as http://purl.uniprot.org/uniprot/Q15848 have the 
advantage that they don't need to be replaced as often as identifiers such 
as http://en.wikipedia.org/wiki/Adiponectin, but that may not be a problem, 
and if you're doing a dictionary, such identifiers can make sense, too.


> And what if the internet archive chose not to archive that HTTP URI?

Then you're out of luck, but I don't see how any other non-HTTP scheme 
would have even given us a chance to recover the "data that is no more"?


>> Don't know how this is best handled in the context of the Semantic Web...
>>
> Would you mind elaborating?

I would, if I had the perfect solution :-) It's probably a good idea to 
keep track of the source (URL!) and time you obtained any statements; 
hoping that in future you may be able to retrieve the exact data you were 
referencing at the time from some archive.
Received on Wednesday, 22 August 2007 13:58:13 UTC