URIs

There was an discussion a few weeks ago about URIs touch on various  
issues. This message is an attempt to untangle them, something I said  
I would write up as an action item in one of the HCLS conference  
calls. We'll be discussing URIs at the monday BioRDF conference call.

As I read the discussion I partitioned it in to three distinct issues:

1) The relationship between the use of a URI in a representation and  
what it dereferences to, if anything. The possibilities seem to be:

   a) The identifier is not intended to be dereferencable. In that   
case the info: scheme was suggested for the form of the uri, as that  
is explicitly not dereferenceable.

   b) The URI is used primarily as a name. Insofar as we want use  
names, it is important there be some stable URIs. Of course it  
doesn't hurt if the URI becomes dereferenceable at some point, and it  
would even be nice, so let's leave open that possibility (but caveats  
in discussion below)

   c) Any URL we use needs to be able to be dereferenced to something.

   d) Any URL we use needs to be able to be dereferenced to the thing  
it is (and not dereferenced if you can't do that). It's only meaning  
is what it dereferences to.

2) What a URI refers to. Some of this conversation was made in the  
form of a discussion about what reasonable arguments to owl:sameAs  
are - for example should one say that http://www.expasy.org/uniprot/ 
P04637 is the sameAs http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ 
efetch.fcgi?db=protein&id=NP_000537.

Another part of the conversation talked in terms of whether the URI  
http://www.expasy.org/uniprot/P04637 should, for our purposes, refer  
to a database record or to a thing in the world - Human P53 proteins.

Of course these are two sides of the same coin - you would only say  
they the two URIs above referred to things in the world. As database  
entries, they are obviously different. There are different fields,  
they are in maintained by different people, etc.

3) Something I will call the social aspect of URIs, for lack of a  
better term. By this I mean those aspects process we go through to  
come to a shared use of of URI. Under this category there is the  
ontology building, the strategies for connecting pieces of  
information generated by different groups. There was a bit in the  
conversations where people were arguing about whether using sameAs  
for mapping was pollution or a necessity, for instance. An important  
part of this in our context is how to define the use of URLs to  
things where there was not rigorous ontological engineering applied  
to create careful definitions, things like terminologies and entries  
in gene databases.

---

I'll offer some of my own opinions on these issues now.

On the matter of what a URI dereferences to, I think it is more  
important to get the names in place quickly. I don't agree with the  
point of view that we should explicitly make them not  
dereferenceable, even though I'm not sure what should come back when  
we ask for what they point to yet. And I don't see support for there  
being a necessity that anything that looks like a URL have a server  
that returns something specific back. Here's a quote from RFC 3986,

> Although many URI schemes are named after protocols, this does not  
> imply that use of these URIs will result in access to the resource  
> via the named protocol.  URIs are often used simply for the sake of  
> identification.

It will part of our social process to come to some understand and  
agreement about what would be useful for us to have come back, if  
anything. Is it an RDF graph? A bunch of OWL definitions of things  
related to the gene? A representation of the asn record? A page of  
HTML? All of the above?

On the question of what kind of concept an entrez gene URI refers to,  
I think that concept needs to be "databaseRecord". There's too many  
different concepts that it could mean if we want it to refer to  
something in the world - does it refer to the sequence of the gene?  
The typical gene? All mutations of it that are found in populations?  
The possible gene products?

Rather, we can use the URI to the database entry to start to build  
concepts by defining properties and using them in OWL class  
definitions in a variety of ways. In foaf and SKOS, for instance,  
there is a property isPrimarySubjectOf. The kind of equivalence we  
can have between http://www.expasy.org/uniprot/P04637 and http:// 
eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? 
db=protein&id=NP_000537 is something like: The same something  
isPrimarySubjectof http://www.expasy.org/uniprot/P04637 and  http:// 
eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? 
db=protein&id=NP_000537.
where "something" is a blank node in RDF.  Or in OWL

Class(P53Gene complete
     restriction(isPrimarySubjectof
                   (value <http://eutils.ncbi.nlm.nih.gov/entrez/ 
eutils/efetch.fcgi?db=protein&id=NP_000537>)))

Class(P53Transcript partial intersectionOf(mRNA restriction 
(derivesFrom someValuesFrom(P53Gene))))

Which says that it is necessary and sufficient for x to be a  
P53Gene,for example, if someone
has stated or it has been inferred that

Individual(x value(isPrimarySubjectOf <http://www.expasy.org/uniprot/ 
P04637>))

and that a P53 transcript, among other things,  is a mRNA that  
derivesFrom some P53Gene.

(there will be more complicated definitions too :)

[sameAs, equivalentClass, equivalentProperty will be a necessity, I  
think, BTW]

As for the social process, I look forward to the discussion on Monday :)

Regards,
Alan


http://www.w3.org/TR/uri-clarification/
Uniform Resource Identifier (URI): Generic Syntax - http:// 
tools.ietf.org/html/3986
Relations in biomedical ontologies - http://genomebiology.com/ 
2005/6/5/R46
http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
http://en.wikipedia.org/wiki/URL

Received on Friday, 16 June 2006 06:52:04 UTC