Re: Unique ID options from samwald@gmx.at on 2007-01-29 (public-semweb-lifesci@w3.org from January 2007)

From: <samwald@gmx.at>
Date: Mon, 29 Jan 2007 14:38:36 +0100
To: "Forsberg, Kerstin L" <Kerstin.L.Forsberg@astrazeneca.com>, public-semweb-lifesci@w3.org
Message-ID: <20070129133836.56760@gmx.net>

> How to uniquely identify such information resources, i.e. the
> recordings of clinical acts of observations ?
> Spontaneously we assigned concatenated identifiers. e.g.
> http://clinic.com/study/T2271/subject/S83221/observation/O6561
> Is this current best practice for unique identification schemas in
> the HCLS community?

It is a common practice, and surely not a bad one, but I don't think it could be called a 'best practice' either. Some purists are discouraging it, put it does have many practical advantages. While the URIs of RDF graphs are not intended to be read by humans, they are still often visible to the user, and a readable URI can be helpful in some occasions. It also makes it much easier to develop and debug Semantic Web applications or datasets this way.

> I have seen some references how UUID
> can be used in a URN. But can they be URI:s as well?

URNs are a subclass of URIs. They can be used in any place where URIs can be used, e.g. in any RDF graph. Is there a reason why you would like to see a non-URN URI scheme for UUIDs? There exists an RFC for the encoding of UUIDs in URNs, so it is probably a good choice to stick with that:
http://ietf.org/rfc/rfc4122.txt

> What are the experiences of assigning this kind of unique
> identifiers to information resources, as well as to real world
> instances, in the HCLS community?

Personally, so far I have mainly tried to use an algorithm to genereate new URIs based on information in the datasource, like in the example you gave. However, depending on the nature of the original data, this can easily lead to conflicts an non-unique URIs. It is hard to circumvent this problem with some datasets.
When you are converting a datasource that already has an identifier (e.g. PMIDs in the case of Pubmed abstracts), these are of course a better choice. 
Personally, I do not think that URIs for non-information-resources need to be resolvable through HTTP or a similar mechanism, so I circumvent many problems associated with that.

Another approach that I have never tried, but I think that is worth thinking about, is to simply rely on large random numbers. Of course, there are use-cases where even a miniscule possibilite of non-uniquness is unacceptable, like in clinical patient records, but in a lot of use cases it is acceptable.

For this, I would prefer to use the TAG uri scheme, including an identifier of yourself or your institution, the current date, and a 128 byte random number:
http://www.taguri.org/

cheers,
Matthias Samwald








.
-- 
"Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ...
Jetzt GMX ProMail testen: http://www.gmx.net/de/go/promail

Received on Monday, 29 January 2007 13:38:42 UTC